Skip to content

bit-soham/Stable-diffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

This project was developed by me and Soham T. Umbare

Re-Implementation of Stable Diffusion

This project focuses on re-implementing Stable Diffusion using a custom architecture with a Variational Autoencoder (VAE), CLIP Encoder, and a U-Net for step-by-step denoising of images based on prompts. A scheduler guides the denoising process.


Architecture and Workflow

Architecture and Workflow

The results shown in the image above are generated using pre-trained weights at this link These are not the results of our custom-trained models.


Training Process and Loss Functions

Our training pipeline follows the diffusion process as described in the original paper. We use two primary losses, which encapsulate the core denoising process. While the paper describes additional losses, they are variations of the following fundamental losses: 1. Forward Trajectory (Adding Noise)

We progressively add noise to the input image $X_0$ through time steps $x_1, x_2, \dots, x_t$ until reaching pure noise $x_t$. The process is defined as:

$q(x_t | x_{t-1}) = \mathcal{N}(x_t ; \sqrt{1 - \beta_t} \cdot x_{t-1}, \beta_t \cdot I)$

Key Terms:

  • $q$: Distribution of the noisy state.
  • $x_t$: Image at step $t$ with added noise.
  • $x_{t-1}$: Image at step $t-1$ with less noise.
  • $\mathcal{N}$: Gaussian distribution.
  • $\beta_t$: Noise decay factor. A higher $\beta_t$ adds more noise.

2. Reverse Trajectory (Removing Noise)

The denoising process reconstructs the clean image $X_0$ from pure noise $x_t$, moving backward through $x_t, x_{t-1}, \dots, x_1$:

$x_{t-1} \sim \mathcal{N}(U_t(x_t), \beta_t \cdot I)$

Key Terms:

  • $U_t(x_t)$: Model's prediction of the clean data based on the noisy input $x_t$.
  • $\beta_t \cdot I$: Variance for randomness, ensuring diversity in reconstructions.

Where:

$U_t(x_t) = \frac{1}{\sqrt{1 - \beta_t}} \cdot \left( x_t - \frac{\beta_t}{\sqrt{1 - \beta_t}} \cdot \epsilon_\theta(x_t, t) \right)$

  • $\epsilon_\theta(x_t, t)$: Predicted noise by the model.

Current Progress

  1. Components Implemented:
    • Forward and reverse trajectories for noise addition and removal.
    • Loss functions based on diffusion model principles.
  2. Trained Models Used:
  3. Diffusion Model Training:
    • Currently in progress, with the aim to achieve competitive results.

Future Goals

  • Complete the training of our diffusion model.
  • Experiment with novel loss functions and scheduler improvements.
  • Benchmark our results against the pre-trained model's performance.

Project Report

For more details, check out our project report.

About

Re-implementation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages