Skip to content

ZenbiteXYZ/VectorTag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

40 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿท๏ธ VectorTag: Automatic Image Tagging System

A deep learning system for image tags generation. Uses ResNET-18 backbone.

๐ŸŽฏ Overview

VectorTag automatically tags images with semantic labels (e.g., "building", "food", "person", "nature"). The system:

  • Multi-label classification: Each image can have multiple tags simultaneously.
  • Interpretable predictions: Uses Grad-CAM to visualize which image regions influenced each tag.
  • Interactive UI: Streamlit-based web interface for inference and exploration.
  • Production-ready: Docker containerization for easy deployment.

Key Features

  • ResNet-18 backbone pretrained on ImageNet
  • BCEWithLogitsLoss with class weighting to handle imbalanced datasets
  • Grad-CAM visualization for model interpretability
  • Data augmentation (rotation, flips, color jitter)
  • LR Scheduler with early stopping to prevent overfitting
  • Streamlit UI for interactive inference
  • Docker deployment ready

๐Ÿ“Š Project Structure

VectorTag/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ””โ”€โ”€ config.py                 # Central configuration (paths, hyperparams)
โ”‚   โ”œโ”€โ”€ data/
โ”‚   โ”‚   โ”œโ”€โ”€ tagged_dataset.py         # PyTorch Dataset class
โ”‚   โ”‚   โ”œโ”€โ”€ loaders.py                # DataLoader creation with transforms
โ”‚   โ”‚   โ””โ”€โ”€ taxonomy.py               # Tag synonyms and hierarchies
โ”‚   โ”œโ”€โ”€ models/
โ”‚   โ”‚   โ””โ”€โ”€ baseline.py               # ResNet-18 model definition
โ”‚   โ”œโ”€โ”€ scripts/
โ”‚   โ”‚   โ””โ”€โ”€ train.py                  # Training loop with auto-plotting
โ”‚   โ”œโ”€โ”€ ui/
โ”‚   โ”‚   โ”œโ”€โ”€ app.py                    # Main Streamlit app
โ”‚   โ”‚   โ”œโ”€โ”€ modes/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ base.py               # Abstract inference mode
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ standard.py           # Standard inference mode
โ”‚   โ”‚   โ””โ”€โ”€ components/               # Reusable UI components
โ”‚   โ””โ”€โ”€ utils/
โ”‚       โ”œโ”€โ”€ gradcam.py                # Grad-CAM heatmap generation
โ”‚       โ””โ”€โ”€ plotting.py               # Training visualization
โ”œโ”€โ”€ models/
โ”‚   โ””โ”€โ”€ standard/
โ”‚       โ”œโ”€โ”€ weights/                  # Saved model weights (.pth)
โ”‚       โ””โ”€โ”€ classes/                  # Class definitions (.json)
โ”œโ”€โ”€ assets/
โ”‚   โ”œโ”€โ”€ exp_00X_*.png                 # Training curves from experiments
โ”‚   โ””โ”€โ”€ comparison_*.png              # Grad-CAM comparisons
โ”œโ”€โ”€ data/
โ”‚   โ””โ”€โ”€ raw/
โ”‚       โ””โ”€โ”€ various_tagged_images/    # Dataset images + metadata.csv
โ”œโ”€โ”€ experiments.md                    # Detailed experiment logs
โ”œโ”€โ”€ requirements.txt                  # Python dependencies
โ”œโ”€โ”€ Dockerfile                        # Container definition
โ””โ”€โ”€ README.md                         # This file

๐Ÿš€ Quick Start

1. Setup Environment

# Clone repository (choose one method below)

# HTTPS
git clone https://github.com/ZenbiteXYZ/VectorTag.git

# Or SSH
git clone git@github.com:ZenbiteXYZ/VectorTag.git

# Navigate to directory
cd VectorTag

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Download Dataset

Download the Kaggle Various Tagged Images dataset and extract to:

data/raw/various_tagged_images/
โ”œโ”€โ”€ metadata.csv
โ”œโ”€โ”€ image_1.jpg
โ”œโ”€โ”€ image_2.jpg
โ””โ”€โ”€ ...

3. Launch UI

streamlit run src/ui/app.py

Then navigate to http://localhost:8501 in your browser.

Note: Pre-trained model weights are already included in models/standard/weights/. For retraining with custom settings, see Training Your Own Model section below.


๐Ÿ“ˆ Experimental Results

Experiment 005: BCEWithLogitsLoss + Class Weights โญ Current Best

Configuration:

  • Loss: BCEWithLogitsLoss with pos_weight (balanced classes)
  • Data: 200K samples, Top-150 tags, stratified split
  • Training: 12 epochs, LR Scheduler, Weight Decay=1e-5

Results:

Epoch Train Loss Val Loss Notes
1 0.2569 0.2028 High loss due to pos_weight
2 0.2107 0.1925 Quick descent
7 0.1726 0.1835 Best validation point
12 0.1372 0.1932 Training continues

Learning Curve Exp 005

Key Insights:

  1. โœ… Sharp Grad-CAM: Model focuses on relevant image regions without noise.
  2. โœ… High Confidence: Predictions reach 70%+ for confident tags.
  3. โš ๏ธ Overfitting starts: After epoch 7, validation loss increases (expected with imbalanced data).
  4. โœ… Best generalization: Compared to other loss functions (Focal Loss performed worse).

Visual Analysis

Building Tag (Grad-CAM Comparison):

  • Exp 002 (BCE): Clear boundary, but low confidence (37%).
  • Exp 004 (Focal): Blurry boundary, more noise (43% confidence).
  • Exp 005 (Weighted BCE): โญ Best: Sharp boundary, high confidence (70%), captures building outline without sky.

GradCAM Building Comparison

Food Tag (Grad-CAM Comparison):

GradCAM Food Comparison


๐Ÿ”ง Configuration

Edit src/core/config.py to customize:

# Model
BATCH_SIZE = 16              # Reduce for high-res images
LEARNING_RATE = 1e-4         # Baseline learning rate
EPOCHS = 12                  # Total epochs
WEIGHT_DECAY = 1e-5          # L2 regularization

# Data
TOP_K = 150                  # Use top-150 most frequent tags
MAX_SAMPLES = 200_000        # Limit dataset size (for speed)

๐Ÿณ Docker Deployment

# Build image
docker build -t vectortag-ui .

# Run container
docker run --rm -p 8501:8501 \
  -v $(pwd)/models:/app/models \
  vectortag-ui

Access UI at http://localhost:8501


๐ŸŽ“ Training Your Own Model

To train or retrain the model with custom settings:

python src/scripts/train.py

What happens:

  • Loads data with augmentation (crop, flip, rotation, color jitter)
  • Computes class weights for imbalanced tags
  • Trains ResNet-18 for N epochs
  • Saves best model to models/standard/weights/
  • Auto-generates learning curve plot

All settings are configurable in src/core/config.py:

  • TOP_K: Number of tags (default: 150)
  • BATCH_SIZE: Batch size (default: 32)
  • EPOCHS: Training epochs (default: 12)
  • LEARNING_RATE: Base LR (default: 1e-4)
  • WEIGHT_DECAY: Regularization (default: 1e-5)

๐Ÿ“š Key Technical Components

Data Processing (src/data/)

  • TaggedImagesDataset: Multi-label dataset with tag synonyms and hierarchies.
  • Stratified split: Ensures rare classes are equally distributed in train/val.
  • Smart subsampling: Weights samples by class rarity for balanced mini-batches.

Model Architecture (src/models/baseline.py)

ResNet-18 (ImageNet pretrained)
    โ†“
Feature Extractor โ†’ 512D
    โ†“
Linear(512 โ†’ 256)
    โ†“
  ReLU()
    โ†“
Dropout(0.4)
    โ†“
Linear(256 โ†’ num_classes)
    โ†“
BCEWithLogitsLoss (per-class)

Grad-CAM Visualization (src/utils/gradcam.py)

  • Computes class activation maps using gradients.

Training Pipeline (src/scripts/train.py)

  1. Class weight computation: pos_weight = (N_neg / N_pos) clamped to [1.0, 20.0]
  2. LR Scheduler: ReduceLROnPlateau reduces learning rate on validation plateau
  3. Early stopping: Saves only best model based on validation loss
  4. Auto-plotting: Generates learning curve after training

๐Ÿ”ฎ Future Improvements

  • Dynamic tag addition: Add new tags without full model retraining
  • Vision Transformer (ViT): Replace ResNet-18 with ViT for better accuracy

๐Ÿ“– Experiment Log

Detailed experiments are documented in experiments.md:

  • Exp 001: Baseline (Overfitting issue discovered)
  • Exp 002: Synonyms + Dropout + Augmentation (Overfitting solved)
  • Exp 003: LR Scheduler + Weight Decay (Better convergence)
  • Exp 004: FocalLoss (Poor Grad-CAM quality)
  • Exp 005: Weighted BCE โญ (Current best)

๐Ÿ“ฆ Dependencies

  • torch, torchvision: Deep learning
  • pillow, pandas: Image & data processing
  • scikit-learn: Stratified split
  • streamlit: Web UI
  • pydantic: Configuration management

๐Ÿค Contributing

Contributions welcome! Areas of interest:

  • Better loss functions for imbalanced multi-label classification
  • Improved data augmentation strategies
  • Alternative backbone architectures
  • Performance optimizations

๐Ÿ“„ License

See LICENSE file.