Welcome to the FFA-Net Project repository. This project contains the implementation of the FFA-Net model with enhancements including Efficient Attention (EA) and Progressive Attention Refinement (PAR). This repository is designed to provide a comprehensive understanding of the FFA-Net architecture and the novel modifications made. Please note that due to computational constraints, the model was trained from scratch using Kaggle's GPU resources. As a result, this repository cannot be cloned and used directly for training.
This repository is structured to offer a clear understanding of the project components. However, if you wish to run the model, you should download and execute the FFA_NET_with_EA_and_PR.ipynb file available in this repository. For the trained model's weights and parameters, you can download the FFA_model.pth file from the repository. For further details and to run the notebook, please refer to my Kaggle page.
The FFA-Net model consists of the following key components:
- FFA-Net Backbone: The core architecture that includes attention mechanisms and progressive refinement modules.
- Efficient Attention (EA): EA aims to improve computational efficiency while maintaining performance. It modifies the standard attention mechanism to reduce computational overhead.
- Progressive Attention Refinement (PAR): PAR refines features progressively through multiple stages, enhancing the model's ability to focus on important features and details.
The loss function used in this project combines pixel-wise loss (L1/L2 loss) and perceptual loss. This hybrid approach ensures that both pixel-level accuracy and high-level feature representations are preserved during the training process. Here’s a detailed explanation of each component and how they are combined:
Pixel-wise loss measures the difference between the predicted and ground truth images at the pixel level. Two common types of pixel-wise loss are L1 loss and L2 loss:
- L1 Loss (Mean Absolute Error): L1 loss calculates the absolute difference between the predicted pixel values and the actual pixel values. It is defined as:
Where:
- φi is the predicted pixel value at position i,
- xi is the ground truth pixel value at position i,
- N is the total number of pixels.
- L2 Loss (Mean Squared Error): L2 loss computes the squared difference between the predicted and ground truth pixel values. It is defined as:
- φi is the predicted pixel value at position i,
- xi is the ground truth pixel value at position i,
- N is the total number of pixels.
Perceptual loss evaluates the quality of the generated image based on high-level feature representations rather than raw pixel differences. This loss is based on the idea that the perceptual quality of an image can be better captured by comparing features extracted from pre-trained deep neural networks (such as VGG) rather than comparing pixel values directly.
The perceptual loss is computed as follows:
Feature Extraction: Pass both the generated image ψI and the ground truth image I through a pre-trained feature extractor (e.g., VGG-19) to obtain feature maps. Let F(ψI) and F(I) denote the feature maps of ψI and I, respectively, at different layers of the network.
Feature Loss Calculation: Compute the loss based on the difference between feature maps. The perceptual loss can be defined as:
Where:
- Fl(ψI) and Fl(I) are the feature maps at layer l for the predicted and ground truth images,
- Nl is the number of elements (e.g., pixels or activations) in the feature map at layer l,
- ∥·∥2 denotes the squared L2 norm.
To balance the pixel-wise accuracy and perceptual quality, the combined loss function is formulated as a weighted sum of the pixel-wise loss and perceptual loss:
Where:- α and β are weights that control the importance of pixel-wise loss and perceptual loss, respectively.
Traditional loss functions like L1 and L2 focus solely on pixel-wise differences. While effective for training, they may not capture perceptual quality or high-level features effectively. Combining these with perceptual loss enables the model to achieve better image quality by preserving textures and details, which might be missed by pixel-wise loss alone.
In your project, this combination allows for both fine-grained pixel accuracy and high-level perceptual quality, leading to improved results in tasks like image dehazing.
Feel free to adjust the weight values α and β based on the specific needs of your model and dataset.
Channel Attention focuses on enhancing or suppressing the significance of different channels in feature maps. This mechanism is designed to emphasize the most informative channels while reducing the less relevant ones.
Formula and Computation:
Global Average Pooling: Compute the global average pooling across spatial dimensions for each channel to obtain a channel-wise descriptor.
Where:
- H and W are the height and width of the feature map,
- xc,h,w is the feature value at channel c and spatial position (h,w),
- zc is the channel descriptor for channel c.
Fully Connected Layers: Pass the channel-wise descriptors through a small neural network (often consisting of fully connected layers) to compute channel-wise attention weights.
Where:
- W1 and W2 are weights,
- b1 and b2 are biases, and
- σ is the sigmoid activation function.
Reweighting: Multiply the original feature map by the computed attention weights.
Where:
- xc,h,w∼ is the reweighted feature map.
Comparison with Traditional Methods: In traditional convolutional neural networks (CNNs) without channel attention, all channels are treated equally. Channel Attention introduces a mechanism to dynamically adjust the importance of each channel, thereby improving the network's ability to focus on relevant features. Traditional methods do not have this adaptive reweighting mechanism, which can limit their ability to capture complex feature dependencies.
Progressive Attention involves adjusting attention weights based on the importance of features at various stages in the network. This mechanism is designed to adaptively focus on different regions or features over multiple stages of processing.
Formula and Computation:
Attention Calculation: Compute the attention weights for each stage based on the feature maps at that stage.
Where:
- ei,j is the attention score for the feature at position (i,j),
- αi,j is the attention weight.
Feature Aggregation: Aggregate features based on attention weights.
Where:
- fi∼ is the aggregated feature for position i,
- fj is the feature at position j.
Comparison with Traditional Methods: Traditional attention mechanisms typically compute attention weights once and apply them throughout the network. Progressive Attention refines these weights iteratively, allowing for dynamic adjustment and better feature focusing. This progressive refinement can lead to improved performance in tasks requiring fine-grained attention.
Efficient Attention aims to reduce the computational complexity of the attention mechanism while maintaining performance. It introduces optimizations to make attention calculations more efficient.
Formula and Computation:
Approximate Attention Calculation: Instead of computing exact attention weights, EA uses approximations to reduce computational costs.
can be approximated by:
Where:
- K~ is an approximation of K.
Sparse Attention: Use sparse matrices to approximate the full attention matrix, reducing the number of computations required.
Where:
- K_s is a sparse approximation of K.
Comparison with Traditional Methods: Traditional attention mechanisms require computing the full attention matrix, leading to high computational complexity. Efficient Attention reduces this complexity by using approximations or sparse representations, making it more scalable to large datasets and models.
Progressive Attention Refinement improves feature extraction by refining attention maps at multiple levels or stages.
Formula and Computation:
Initial Attention: Compute initial attention maps similar to traditional attention mechanisms.
Refinement Stages: Iteratively refine attention maps using additional attention mechanisms or layers.
Where:
- A(t) is the attention map at stage t,
- Refine represents the refinement process.
Feature Integration: Combine features using refined attention maps.
Where:
- Ai,j(T) is the refined attention weight,
- xj is the feature at position j.
Comparison with Traditional Methods: Traditional attention mechanisms do not adaptively refine attention weights over multiple stages. Progressive Attention Refinement iteratively adjusts these weights, potentially leading to better feature extraction and model performance.
| Attention Mechanism | Advantages | Limitations | Comparison with Traditional Methods |
|---|---|---|---|
| Channel Attention | Enhances relevant channels, improves feature discrimination. | May require additional computations for channel-wise descriptors. | Traditional methods treat all channels equally, lacking adaptive focus. |
| Progressive Attention | Refines attention weights over multiple stages, improves feature focus. | More complex to implement, potentially higher computational cost. | Traditional methods use fixed attention weights throughout the network. |
| Efficient Attention | Reduces computational complexity, scalable to larger models. | Approximation might affect performance in some cases. | Traditional methods require full attention matrix computation, which is more computationally intensive. |
| Progressive Attention Refinement | Iteratively refines attention maps, potentially better feature extraction. | Complex implementation, may involve higher computation in refinement stages. | Traditional attention mechanisms do not refine weights iteratively, potentially limiting performance. |
The original FFA-Net model includes 12 blocks per group. In this modified version, the architecture uses 5 blocks per group to reduce computational requirements. Despite this reduction, the modified model achieves comparable SSIM values and close PSNR values to the original model.
Training Loss:
- Step 5000: Loss = 0.14956 | SSIM = 0.8106 | PSNR = 17.9430
- Step 10000: Loss = 0.02472 | SSIM = 0.8500 | PSNR = 20.0667
- Step 15000: Loss = 0.07915 | SSIM = 0.8584 | PSNR = 20.3824
- Step 20000: Loss = 0.09964 | SSIM = 0.8501 | PSNR = 20.7447
The results may appear lower than those reported in the original paper due to the reduced number of blocks and computational constraints. However, the modified model still achieves competitive performance, particularly in SSIM, and maintains close PSNR values.
- Download the Notebook: Obtain the FFA_NET_with_EA_and_PR.ipynb from this repository.
- Download Trained Weights: Download FFA_model.pth for pretrained model parameters.
- Run the Notebook: Follow the instructions within the notebook to run the model and test its performance.
For additional details or to run the notebook, please visit my Kaggle page.
- Kaggle for providing the computational resources.
- The original authors of FFA-Net for their foundational work.
For any questions or issues, please contact Pranav Balaji R S at pranavbalajirs@gmail.com


