Generative AI Project - Conditional Image Generation

Project Overview

This project implements and compares different Generative AI architectures for conditional image synthesis. The goal is to generate realistic human faces based on the CelebA dataset, enabling control over specific semantic attributes (gender, expression, age).

The system is designed with modularity in mind to support the training and inference of three distinct classes of generative models:

Conditional Variational Autoencoder (C-VAE)
Conditional Denoising Diffusion Probabilistic Model (C-DDPM)
Conditional Vision Mamba

Key Features

Multi-Model Architecture: Native support for VAE, Diffusion Models, and State Space Models (Mamba) via a Factory pattern.
Conditional Generation: Models accept attribute vectors to guide generation (e.g., Young, Smiling, Male).
Advanced Training Management:
- Automatic Checkpointing system to resume training in case of interruption.
- Progress monitoring with a custom indicator.
- Periodic generation of Visual Snapshots to qualitatively evaluate learning during epochs.
Unified Inference Script: Flexible tool for generating single samples or comparative grids of all attribute combinations.

Repository Structure

├── config/             # Configuration and hyperparameter management
├── data/               # Dataset loader and preprocessing (CelebA)
├── models/             # Neural architecture implementations
│   ├── autoencoder/    # Classes for Conditional VAE
│   ├── diffusion/      # Classes for Conditional Diffusion (U-Net, Noise Schedule)
│   └── mamba/          # Classes for Vision Mamba (MambaBlock, PScan)
├── train/              # Training logic (Trainer loop)
├── utility/            # Support tools (Checkpoint, visualization, logging)
├── weights/            # Directory for saving model weights
├── generate.py         # Image generation script
├── train.py            # Training entry point
└── unzip.py            # Utility for dataset extraction

Requirements and Installation

The project requires Python 3.x and standard scientific and deep learning libraries.

Clone the repository.
Ensure dependencies are installed (PyTorch, Torchvision, NumPy, Matplotlib, PIL).

Dataset Preparation

The project uses the CelebA dataset. Images must be placed in the dataset/ folder. A utility script is provided to automatically decompress archives:

python unzip.py

Note: The script searches for .zip files in the ./dataset folder and extracts contents in place.

Configuration

Training configuration is managed via Environment Variables or default values defined in config/config.py. Parameters can be modified without changing the code by setting variables before execution.

Main Parameters:

Variable	Default	Description
`MODEL_TYPE`	`mamba`	Model type: `vae`, `diff`, `mamba`.
`BATCH_SIZE`	`256`	Training batch size.
`EPOCHS`	`1000`	Total number of epochs.
`LEARNING_RATE`	`5e-4`	Learning rate for the Adam optimizer.
`CHECKPOINT_INTERVAL`	`300`	Interval (seconds) for saving checkpoints.
`DEVICE`	`cuda`	Computing device (`cuda` or `cpu`).

Usage

Training

To start training, configure the desired model type and run the train.py script. The system will automatically load the last available checkpoint if one exists.

Example (Linux/Mac):

export MODEL_TYPE=vae
export BATCH_SIZE=64
export EPOCHS=50
python train.py

Example (Windows PowerShell):

$env:MODEL_TYPE="diff"
$env:BATCH_SIZE="32"
python train.py

During training, weights will be saved in the temp/CHECKPOINT directory, and visual snapshots will be generated to monitor quality.

Generation (Inference)

The generate.py script allows image generation using trained models. It supports two main modes:

Manual Mode: Generates a set of images with specific attributes.

python generate.py --model vae --male 1 --smiling 1 --young 1 --num_samples 8 --show

Grid Mode (All Combos): Generates a grid showing all 8 possible attribute combinations (Male/Female, Smile/NoSmile, Young/Old).

python generate.py --model diff --all_combos --samples_per_combo 2

Examples

VAE

Woman, Smiling, Young - 16 images

python generate.py --model vae --male 0 --smiling 1 --young 1 --num_samples 16 --show

All combinations (grid) - 4 images per combo

python generate.py --model vae --all_combos --samples_per_combo 4 --show

Diffusion Model

All combinations (grid) - fast (500 DDIM steps) eta = 0.0 - 4 images per combo

python generate.py --model diff --all_combos --samples_per_combo 2 --steps 500 --eta 0.8 --show

All combinations (grid) - high quality (DDPM) - 2 images per combo

python generate.py --model diff --all_combos --samples_per_combo 1 --steps 1000 --eta 1.0

Vision Mamba

All combinations (grid) - low temperature (deterministic) - 4 images per combo

python generate.py --model mamba --all_combos --samples_per_combo 4 --temperature 0.0 --show

All combinations (grid) - high temperature (diversity) - 4 images per combo

python generate.py --model mamba --all_combos --samples_per_combo 4 --temperature 0.01 --show

Arguments:

--model: Required. Choice between vae, diff, mamba.
--output_dir: Destination folder (default: ./generated_samples).
--show: Displays generated images on screen in addition to saving them.

Model Technical Details

1. Conditional VAE

A symmetric Encoder-Decoder architecture with convolutional layers and residual connections (Skip Connections).

Encoder: Compresses the image and attributes into a latent space parameterized by mean ($\mu$) and variance ($\sigma$).
Decoder: Reconstructs the image starting from the sampled latent vector and attributes.

2. Conditional Diffusion (DDPM)

Based on a U-Net with sinusoidal Time Encoding and attribute conditioning via linear projections.

Uses a pre-calculated noise schedule (Beta/Alpha schedule).
Supports guided sampling via the LAMBDA ($\lambda$) parameter (Classifier-Free Guidance style).

3. Vision Mamba

Innovative implementation based on State Space Models (SSM).

Utilizes ResidualMambaLayer blocks integrating parallel scan operations (pscan) for computational efficiency.
Treats images as sequences of flattened patches, combined with positional and attribute embeddings.

Authors: Generative AI - Project Group 03 - UNISA A.A. 2025/26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generative AI Project - Conditional Image Generation

Project Overview

Key Features

Repository Structure

Requirements and Installation

Dataset Preparation

Configuration

Usage

Training

Generation (Inference)

Examples

VAE

Diffusion Model

Vision Mamba

Model Technical Details

1. Conditional VAE

2. Conditional Diffusion (DDPM)

3. Vision Mamba

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
config		config
data		data
models		models
train		train
utility		utility
weights		weights
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
generate.py		generate.py
train.py		train.py
unzip.py		unzip.py

SimoneFaraulo/GenAI_Project_Gr03

Folders and files

Latest commit

History

Repository files navigation

Generative AI Project - Conditional Image Generation

Project Overview

Key Features

Repository Structure

Requirements and Installation

Dataset Preparation

Configuration

Usage

Training

Generation (Inference)

Examples

VAE

Diffusion Model

Vision Mamba

Model Technical Details

1. Conditional VAE

2. Conditional Diffusion (DDPM)

3. Vision Mamba

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages