Simple reproduction of the core ideas from Deep Compression : Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (Han et al., 2015)
goal : using pruning and quantization to make a trained neural network (considerably) smaller without significantly hurting accuracy
Why LeNet on MNIST?
- trains in seconds
- simple architecture
- yet significant redundancy > demonstrates the mechanics of pruning and quantization
Project Structure:
-
Baseline model (~99% accuracy, 60k parameters ; serves as the baseline before compression)
-
One-shot global magnitude Pruning (removes the smallest-magnitude weights in a single pass (no fine-tuning) > sharp accuracy collapse at around 90% sparsity)
-
Iterative prune + fine tune (+9% each round, fine-tuning after each) > preserving performance better at high sparsity (original method used in Deep Compression)
-
k-means Quantization (for each layer, cluster remaining non-zero weights into k centroids and replace weights with their nearest centroid) reduces the number of distinct weight values and enables storage in fewer bits
k=64 → virtually no accuracy loss k=16/8 → slight degradation the model remains surprisingly stable even with aggressive quantization
Even though MNIST is small, the experiment reproduces the qualitative behavior reported in the original Deep Compression paper.