Skip to content

Simple reproduction of the core ideas from Deep Compression (Han et al., 2015)

Notifications You must be signed in to change notification settings

coasensi/deep-compression-implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Simple reproduction of the core ideas from Deep Compression : Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (Han et al., 2015)

goal : using pruning and quantization to make a trained neural network (considerably) smaller without significantly hurting accuracy

Why LeNet on MNIST?

  • trains in seconds
  • simple architecture
  • yet significant redundancy > demonstrates the mechanics of pruning and quantization

Project Structure:

  1. Baseline model (~99% accuracy, 60k parameters ; serves as the baseline before compression)

  2. One-shot global magnitude Pruning (removes the smallest-magnitude weights in a single pass (no fine-tuning) > sharp accuracy collapse at around 90% sparsity)

  3. Iterative prune + fine tune (+9% each round, fine-tuning after each) > preserving performance better at high sparsity (original method used in Deep Compression)

  4. k-means Quantization (for each layer, cluster remaining non-zero weights into k centroids and replace weights with their nearest centroid) reduces the number of distinct weight values and enables storage in fewer bits

    k=64 → virtually no accuracy loss k=16/8 → slight degradation the model remains surprisingly stable even with aggressive quantization

Even though MNIST is small, the experiment reproduces the qualitative behavior reported in the original Deep Compression paper.

About

Simple reproduction of the core ideas from Deep Compression (Han et al., 2015)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published