numpy-gpt2

A minimal GPT-2 inference implementation using NumPy only.

No PyTorch.
No autograd.
No training loop.

Just matrices, attention, and pretrained weights.

What this repository is

This project implements GPT-2 (small, 117M) inference from scratch using NumPy only.

The goal is not performance.
The goal is understanding.

Every major component of GPT-2 is explicitly implemented:

token & positional embeddings
causal multi-head self-attention
MLP (feed-forward network)
residual connections
layer normalization
autoregressive text generation

The model loads official pretrained weights from Hugging Face (safetensors)
and runs inference without any deep learning framework.

What this repository is not

❌ Not a training implementation
❌ Not optimized (no KV cache, O(N²) generation)
❌ Not meant for production
❌ Not a PyTorch reimplementation in disguise

This is an educational, mechanical view of what a large language model actually does.

Why this exists

Large Language Models often appear “magical”.

This repository removes that illusion.

What remains is:

linear algebra
attention
statistical coherence

There is no reasoning engine here.
No symbols.
No facts.

And yet, semantic consistency emerges.

That tension is the point.

Example

./.venv/bin/python main.py \
"The animal is yellow. It's a cat. It's color is yellow. The color of the cat"

The model correctly completes the sentence by inferring that the cat is yellow.

There is no logic rule enforcing this. Only attention maintaining semantic invariants across the sequence.

How generation works

Text generation is autoregressive and intentionally naïve:

Tokenize the prompt
Run a full forward pass over the sequence
Take the most probable next token (greedy sampling)
Append it
Repeat

⚠️ For pedagogical clarity, no KV cache is used. This means the entire sequence is recomputed at each step.

Architecture overview

GPT-2 small:

Vocabulary size: 50,257
Context window: 1024 tokens
Embedding dimension: 768
Layers: 12
Attention heads: 12

Each transformer block uses Pre-LayerNorm, exactly like the original GPT-2.

Weight tying between token embeddings and output projection is preserved.

Code structure

.
├── main.py        # Entry point
├── generate.py    # Autoregressive generation loop
├── model.py       # GPT-2 model
├── layer.py       # Transformer block
├── attention.py   # Causal multi-head attention
├── mlp.py         # Feed-forward network
├── tensor_ops.py  # NumPy tensor operations
├── tokenizer.py   # GPT-2 tokenizer (no transformers)
├── load.py        # Load pretrained weights (safetensors)
└── config.py      # Model configuration

Everything is explicit. Nothing is hidden behind a framework.

Requirements

Python 3.9+
NumPy
tokenizers
huggingface_hub
safetensors

It is recommended to use a virtual environment.

Create and activate a .venv:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install numpy tokenizers huggingface_hub safetensors

Disclaimer

This project is educational.

It prioritizes:

clarity
faithfulness to GPT-2
conceptual transparency

over:

speed
memory efficiency
scalability

If you want performance, use a real inference engine. If you want understanding, read the code.

License

Apache 2.0, see LICENSE file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

numpy-gpt2

What this repository is

What this repository is not

Why this exists

Example

How generation works

Architecture overview

Code structure

Requirements

Disclaimer

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
DISCLAIMER.md		DISCLAIMER.md
LICENSE		LICENSE
README.md		README.md
attention.py		attention.py
config.py		config.py
debug.py		debug.py
generate.py		generate.py
layer.py		layer.py
load.py		load.py
main.py		main.py
mlp.py		mlp.py
model.py		model.py
requirements.txt		requirements.txt
tensor_ops.py		tensor_ops.py
tokenizer.py		tokenizer.py

License

xigh/numpy-gpt2

Folders and files

Latest commit

History

Repository files navigation

numpy-gpt2

What this repository is

What this repository is not

Why this exists

Example

How generation works

Architecture overview

Code structure

Requirements

Disclaimer

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages