From-Scratch Encoder & Decoder Transformer Models
Transformer-101 contains from-scratch implementations of Transformer encoder-only and decoder-only architectures, trained, evaluated, and released publicly.
The project focuses on understanding Transformer internals at an architecture level, rather than building a production-scale LLM.
- Task: Text classification
- Dataset: AG News
- Architecture: Transformer encoder
- Trained end-to-end from scratch
Model:
π€ https://huggingface.co/m4vic/agnews-transformer-encoder
- Task: Autoregressive text generation
- Dataset: WikiText-103
- Architecture: Transformer decoder with causal masking
Model:
π€ https://huggingface.co/m4vic/MiniGPT-Wiki103
- Multi-head self-attention implemented manually
- Positional encoding from scratch
- Encoder and decoder trained independently
- Training, evaluation, and experiments included
- No prebuilt Transformer blocks used
Each model has its own sub-README with full details.
This project was built as a deep dive into Transformer internals.
A full encoder walkthrough (math + code) is explained on a YouTube channel (Hindi). https://youtube.com/playlist?list=PLSZTCcoNvltkFZo1acLDmGUk32ROGgW_f&si=X6xFs95iKsCxHEmX
- Educational & research-focused
- Not optimized for large-scale production
- Designed for clarity and experimentation
transformer-101/ βββ encoder/ βββ decoder/ βββ README.md