FFA-Net with Enhanced Attention and Progressive Attention Refinement for Image Dehazing

Overview

Welcome to the FFA-Net Project repository. This project contains the implementation of the FFA-Net model with enhancements including Efficient Attention (EA) and Progressive Attention Refinement (PAR). This repository is designed to provide a comprehensive understanding of the FFA-Net architecture and the novel modifications made. Please note that due to computational constraints, the model was trained from scratch using Kaggle's GPU resources. As a result, this repository cannot be cloned and used directly for training.

Important Notice

This repository is structured to offer a clear understanding of the project components. However, if you wish to run the model, you should download and execute the FFA_NET_with_EA_and_PR.ipynb file available in this repository. For the trained model's weights and parameters, you can download the FFA_model.pth file from the repository. For further details and to run the notebook, please refer to my Kaggle page.

Architecture

The FFA-Net model consists of the following key components:

FFA-Net Backbone: The core architecture that includes attention mechanisms and progressive refinement modules.

Novelty Introduced

Efficient Attention (EA): EA aims to improve computational efficiency while maintaining performance. It modifies the standard attention mechanism to reduce computational overhead.
Progressive Attention Refinement (PAR): PAR refines features progressively through multiple stages, enhancing the model's ability to focus on important features and details.

Loss Function

The loss function used in this project combines pixel-wise loss (L1/L2 loss) and perceptual loss. This hybrid approach ensures that both pixel-level accuracy and high-level feature representations are preserved during the training process. Here’s a detailed explanation of each component and how they are combined:

1. Pixel-Wise Loss

Pixel-wise loss measures the difference between the predicted and ground truth images at the pixel level. Two common types of pixel-wise loss are L1 loss and L2 loss:

L1 Loss (Mean Absolute Error): L1 loss calculates the absolute difference between the predicted pixel values and the actual pixel values. It is defined as:

Where:

φ_i is the predicted pixel value at position i,
x_i is the ground truth pixel value at position i,
N is the total number of pixels.

L2 Loss (Mean Squared Error): L2 loss computes the squared difference between the predicted and ground truth pixel values. It is defined as:

Where:

φ_i is the predicted pixel value at position i,
x_i is the ground truth pixel value at position i,
N is the total number of pixels.

2. Perceptual Loss

Perceptual loss evaluates the quality of the generated image based on high-level feature representations rather than raw pixel differences. This loss is based on the idea that the perceptual quality of an image can be better captured by comparing features extracted from pre-trained deep neural networks (such as VGG) rather than comparing pixel values directly.

The perceptual loss is computed as follows:

Feature Extraction: Pass both the generated image ψ_I and the ground truth image I through a pre-trained feature extractor (e.g., VGG-19) to obtain feature maps. Let F(ψ_I) and F(I) denote the feature maps of ψ_I and I, respectively, at different layers of the network.

Feature Loss Calculation: Compute the loss based on the difference between feature maps. The perceptual loss can be defined as:

Where:

F_l(ψ_I) and F_l(I) are the feature maps at layer l for the predicted and ground truth images,
N_l is the number of elements (e.g., pixels or activations) in the feature map at layer l,
∥·∥₂ denotes the squared L2 norm.

The perceptual loss helps ensure that high-level features and textures are preserved in the generated image.

3. Combined Loss Function

To balance the pixel-wise accuracy and perceptual quality, the combined loss function is formulated as a weighted sum of the pixel-wise loss and perceptual loss:

Where:

α and β are weights that control the importance of pixel-wise loss and perceptual loss, respectively.

Comparison with Traditional Loss Functions

Traditional loss functions like L1 and L2 focus solely on pixel-wise differences. While effective for training, they may not capture perceptual quality or high-level features effectively. Combining these with perceptual loss enables the model to achieve better image quality by preserving textures and details, which might be missed by pixel-wise loss alone.

In your project, this combination allows for both fine-grained pixel accuracy and high-level perceptual quality, leading to improved results in tasks like image dehazing.

Feel free to adjust the weight values α and β based on the specific needs of your model and dataset.

1. Channel Attention (CA)

Channel Attention focuses on enhancing or suppressing the significance of different channels in feature maps. This mechanism is designed to emphasize the most informative channels while reducing the less relevant ones.

Formula and Computation:

Global Average Pooling: Compute the global average pooling across spatial dimensions for each channel to obtain a channel-wise descriptor.

Where:

H and W are the height and width of the feature map,
x_c,h,w is the feature value at channel c and spatial position (h,w),
z_c is the channel descriptor for channel c.

Fully Connected Layers: Pass the channel-wise descriptors through a small neural network (often consisting of fully connected layers) to compute channel-wise attention weights.

Where:

W₁ and W₂ are weights,
b₁ and b₂ are biases, and
σ is the sigmoid activation function.

Reweighting: Multiply the original feature map by the computed attention weights.

Where:

x_c,h,w^∼ is the reweighted feature map.

Comparison with Traditional Methods: In traditional convolutional neural networks (CNNs) without channel attention, all channels are treated equally. Channel Attention introduces a mechanism to dynamically adjust the importance of each channel, thereby improving the network's ability to focus on relevant features. Traditional methods do not have this adaptive reweighting mechanism, which can limit their ability to capture complex feature dependencies.

2. Progressive Attention (PA)

Progressive Attention involves adjusting attention weights based on the importance of features at various stages in the network. This mechanism is designed to adaptively focus on different regions or features over multiple stages of processing.

Formula and Computation:

Attention Calculation: Compute the attention weights for each stage based on the feature maps at that stage.

Where:

e_i,j is the attention score for the feature at position (i,j),
α_i,j is the attention weight.

Feature Aggregation: Aggregate features based on attention weights.

Where:

f_i^∼ is the aggregated feature for position i,
f_j is the feature at position j.

Comparison with Traditional Methods: Traditional attention mechanisms typically compute attention weights once and apply them throughout the network. Progressive Attention refines these weights iteratively, allowing for dynamic adjustment and better feature focusing. This progressive refinement can lead to improved performance in tasks requiring fine-grained attention.

3. Efficient Attention (EA)

Efficient Attention aims to reduce the computational complexity of the attention mechanism while maintaining performance. It introduces optimizations to make attention calculations more efficient.

Formula and Computation:

Approximate Attention Calculation: Instead of computing exact attention weights, EA uses approximations to reduce computational costs.

can be approximated by:

Where:

K~ is an approximation of K.

Sparse Attention: Use sparse matrices to approximate the full attention matrix, reducing the number of computations required.

Where:

K_s is a sparse approximation of K.

Comparison with Traditional Methods: Traditional attention mechanisms require computing the full attention matrix, leading to high computational complexity. Efficient Attention reduces this complexity by using approximations or sparse representations, making it more scalable to large datasets and models.

4. Progressive Attention Refinement (PAR)

Progressive Attention Refinement improves feature extraction by refining attention maps at multiple levels or stages.

Formula and Computation:

Initial Attention: Compute initial attention maps similar to traditional attention mechanisms.

Refinement Stages: Iteratively refine attention maps using additional attention mechanisms or layers.

Where:

A(t) is the attention map at stage t,
Refine represents the refinement process.

Feature Integration: Combine features using refined attention maps.

Where:

A_i,j(T) is the refined attention weight,
x_j is the feature at position j.

Comparison with Traditional Methods: Traditional attention mechanisms do not adaptively refine attention weights over multiple stages. Progressive Attention Refinement iteratively adjusts these weights, potentially leading to better feature extraction and model performance.

Summary Table

Attention Mechanism	Advantages	Limitations	Comparison with Traditional Methods
Channel Attention	Enhances relevant channels, improves feature discrimination.	May require additional computations for channel-wise descriptors.	Traditional methods treat all channels equally, lacking adaptive focus.
Progressive Attention	Refines attention weights over multiple stages, improves feature focus.	More complex to implement, potentially higher computational cost.	Traditional methods use fixed attention weights throughout the network.
Efficient Attention	Reduces computational complexity, scalable to larger models.	Approximation might affect performance in some cases.	Traditional methods require full attention matrix computation, which is more computationally intensive.
Progressive Attention Refinement	Iteratively refines attention maps, potentially better feature extraction.	Complex implementation, may involve higher computation in refinement stages.	Traditional attention mechanisms do not refine weights iteratively, potentially limiting performance.

Differences from Original FFA-Net

The original FFA-Net model includes 12 blocks per group. In this modified version, the architecture uses 5 blocks per group to reduce computational requirements. Despite this reduction, the modified model achieves comparable SSIM values and close PSNR values to the original model.

Results

Training Loss:

Step 5000: Loss = 0.14956 | SSIM = 0.8106 | PSNR = 17.9430
Step 10000: Loss = 0.02472 | SSIM = 0.8500 | PSNR = 20.0667
Step 15000: Loss = 0.07915 | SSIM = 0.8584 | PSNR = 20.3824
Step 20000: Loss = 0.09964 | SSIM = 0.8501 | PSNR = 20.7447

The results may appear lower than those reported in the original paper due to the reduced number of blocks and computational constraints. However, the modified model still achieves competitive performance, particularly in SSIM, and maintains close PSNR values.

Getting Started

Download the Notebook: Obtain the FFA_NET_with_EA_and_PR.ipynb from this repository.
Download Trained Weights: Download FFA_model.pth for pretrained model parameters.
Run the Notebook: Follow the instructions within the notebook to run the model and test its performance.

For additional details or to run the notebook, please visit my Kaggle page.

Acknowledgements

Kaggle for providing the computational resources.
The original authors of FFA-Net for their foundational work.

For any questions or issues, please contact Pranav Balaji R S at pranavbalajirs@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
docs		docs
notebooks		notebooks
results		results
src		src
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FFA-Net with Enhanced Attention and Progressive Attention Refinement for Image Dehazing

Overview

Important Notice

Architecture

Novelty Introduced

Loss Function

1. Pixel-Wise Loss

2. Perceptual Loss

3. Combined Loss Function

Comparison with Traditional Loss Functions

1. Channel Attention (CA)

2. Progressive Attention (PA)

3. Efficient Attention (EA)

4. Progressive Attention Refinement (PAR)

Summary Table

Differences from Original FFA-Net

Results

Getting Started

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

PRANAVBALAJIRS/VisionRevive

Folders and files

Latest commit

History

Repository files navigation

FFA-Net with Enhanced Attention and Progressive Attention Refinement for Image Dehazing

Overview

Important Notice

Architecture

Novelty Introduced

Loss Function

1. Pixel-Wise Loss

2. Perceptual Loss

3. Combined Loss Function

Comparison with Traditional Loss Functions

1. Channel Attention (CA)

2. Progressive Attention (PA)

3. Efficient Attention (EA)

4. Progressive Attention Refinement (PAR)

Summary Table

Differences from Original FFA-Net

Results

Getting Started

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages