A pure PyTorch implementation of Low-Rank Adaptation (LoRA) applied to Vision Transformers (ViT).
Unlike standard implementations that use the peft library, this project manually implements the mathematical logic of low-rank matrix injection (W = W0 + BA) into the Query and Value projections of the Self-Attention mechanism.
By freezing the pre-trained ViT backbone and training only the rank-decomposition matrices (rank=8), we achieved:
| Metric | Full Fine-Tuning | LoRA (Ours) | Impact |
|---|---|---|---|
| Trainable Params | 86,567,656 | 221,184 | 99.75% Reduction π |
| Model Size (Weights) | ~330 MB | ~0.8 MB | Storage Efficient πΎ |
| Training Speed | Slow | Fast | Converges in <5 epochs |
| Performance | Baseline | High | Robust transfer to EuroSAT |
The core innovation is the custom LoRA_QV_Linear layer which wraps frozen PyTorch layers:
-
$W_0$ : Frozen pre-trained weights (d_model Γ d_model) -
$A$ : Trainable low-rank matrix (rank Γ d_model) -
$B$ : Trainable low-rank matrix (d_model Γ rank) - We specifically target Query (Q) and Value (V) matrices in the Multi-Head Attention blocks, leaving Key (K) and MLP layers frozen.
src/lora.py: Contains the customLoRA_QV_Linearclass with manual weight initialization.src/utils.py: Logic to recursively traverse the ViT model and swapnn.Linearlayers with LoRA layers.train.py: Training loop with validation and checkpointing on the EuroSAT dataset.
pip install -r requirements.txt