mini transformer language model (educational PyTorch project)

5hridhyan · 2026-01-25 17:47:22

Hello everyone,
I'd like to share my personal project about a transformer based model which is very very small and rough compared to today's LLMs, which is for experimentation and educational purpose.
the idea was simple like casual self attention with masking, pre-norm transformer blocks, simple charecter level tokenizer, training + validation loop with checkpointing, autoregressive generation with temperature, top-k, and top-p sampling.

yeah it is not *production grade model or smtg like that and it is not optimized for speed or memory, I intentionally kept the code small and explicit so the mechanics are easy to follow without abstractions. Writing this helped me understand things like attention masking, training stability, and sampling behavior much better than using high-level APIs.

Repository
https://github.com/Aranya-Marjara/mini-transformer-lab

Feedbacks like what can be improved, memory efficient/speed etc are welcome
Thanks

Arch Linux

#1 2026-01-25 17:47:22

mini transformer language model (educational PyTorch project)

Board footer