SARAN: Shallow Auto-Regressive Attention Network
-
Updated
Jan 9, 2026 - Python
SARAN: Shallow Auto-Regressive Attention Network
A pure Rust GPT implementation from scratch.
This notebook builds a complete GPT (Generative Pre-trained Transformer) model from scratch using PyTorch. It covers tokenization, self-attention, multi-head attention, transformer blocks, and text generation and all explained step-by-step with a simple nursery rhyme corpus.
ToyGPT, inspired by Andrej Karpathy’s GPT from scratch, creates a toy generative pre-trained transformer at its most basic level using a simple bigram language model with attention to help educate on the basics of creating an LLM from scratch.
echoGPT is a minimal GPT implementation for character-level language modeling with 25.4M parameters. Built with PyTorch, it includes multi-head self-attention, feed-forward layers, and position embeddings. Trained on text like tiny_shakespeare.txt to predict the next character.
🤖 Build a pure Rust GPT model from scratch, showcasing transformer architecture without deep learning frameworks. Perfect for hands-on learning.
Add a description, image, and links to the generative-pretrained-transformers topic page so that developers can more easily learn about it.
To associate your repository with the generative-pretrained-transformers topic, visit your repo's landing page and select "manage topics."