Skip to content

A comparative implementation of Vision Transformer (ViT) and Differential Vision Transformer (Diff-ViT) on CIFAR-10 [Subset of Assignment 5 of Computer Vision - Spring '25 IIIT Hyderabad]

Notifications You must be signed in to change notification settings

Varun0157/vision-transformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

143 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vision Transformers: ViT vs Differential ViT

A comparative implementation of Vision Transformer (ViT) and Differential Vision Transformer (Diff-ViT) on CIFAR-10, exploring how differential attention mechanisms compare to standard multi-head attention in the vision domain.


Results

Vision Transformer (ViT) Differential Vision Transformer (Diff-ViT)
Attention Rollout
ViT Attention Rollout Diff-ViT Attention Rollout
Differential attention cancels noise through dual softmax maps, producing more focused attention patterns on relevant regions.
Attention Maps (CLS Token → Patches)
ViT Attention Maps Diff-ViT Attention Maps
Noise cancellation amplifies attention to relevant context—some heads show stronger, more sparse activation patterns compared to standard attention.
Position Embedding Test Accuracy
ViT Position Embedding Accuracy Diff-ViT Position Embedding Accuracy

For more details, see the full reports: ViT Report | Diff-ViT Report


Setup

# Install dependencies using uv
uv sync

# Dataset (CIFAR-10) downloads automatically on first run to {vit,dvit}/data/

Usage

cd vit # or dvit
uv run python -m src.vit --mode train # or vis

About

A comparative implementation of Vision Transformer (ViT) and Differential Vision Transformer (Diff-ViT) on CIFAR-10 [Subset of Assignment 5 of Computer Vision - Spring '25 IIIT Hyderabad]

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published