Skip to content

Building a mini LLM from scratch. Embedding, positional encoding , encoder decoder , GPT

Notifications You must be signed in to change notification settings

m4vic/Train-Transformers-encoder-decoder-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

49 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Train Transformer encoder and decoder

From-Scratch Encoder & Decoder Transformer Models


Overview

Transformer-101 contains from-scratch implementations of Transformer encoder-only and decoder-only architectures, trained, evaluated, and released publicly.

The project focuses on understanding Transformer internals at an architecture level, rather than building a production-scale LLM.


Models Included

πŸ”Ή Encoder-Only Transformer (Classification)

  • Task: Text classification
  • Dataset: AG News
  • Architecture: Transformer encoder
  • Trained end-to-end from scratch

Model:
πŸ€— https://huggingface.co/m4vic/agnews-transformer-encoder


πŸ”Ή Decoder-Only Transformer (GPT-style Generation)

  • Task: Autoregressive text generation
  • Dataset: WikiText-103
  • Architecture: Transformer decoder with causal masking

Model:
πŸ€— https://huggingface.co/m4vic/MiniGPT-Wiki103


Key Details

  • Multi-head self-attention implemented manually
  • Positional encoding from scratch
  • Encoder and decoder trained independently
  • Training, evaluation, and experiments included
  • No prebuilt Transformer blocks used

Each model has its own sub-README with full details.


Educational Context

This project was built as a deep dive into Transformer internals.

A full encoder walkthrough (math + code) is explained on a YouTube channel (Hindi). https://youtube.com/playlist?list=PLSZTCcoNvltkFZo1acLDmGUk32ROGgW_f&si=X6xFs95iKsCxHEmX


Scope

  • Educational & research-focused
  • Not optimized for large-scale production
  • Designed for clarity and experimentation

Structure

transformer-101/ β”œβ”€β”€ encoder/ β”œβ”€β”€ decoder/ └── README.md

About

Building a mini LLM from scratch. Embedding, positional encoding , encoder decoder , GPT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published