Skip to content

Tiphainell/CelebA-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Imbalanced Face Attribute Classification

This project was originally developed as part of a technical assessment for a Computer Vision / Machine Learning Engineer position and was later refined as a portfolio project.

It addresses a binary face classification task with highly imbalanced classes using the CelebA dataset. The performance of ResNet18 and a custom CNN is evaluated using 5-fold stratified cross-validation, with HTER as the main evaluation metric.

The positive (minority) class corresponds to samples exhibiting at least one of the following attributes: Hat, Eyeglasses, or Mustache.


Dataset

This project uses the CelebA data set, a large-scale face attributes with more than 200K celebrity images each with 40 attribute annotations. You can download images in img_align_celeba.zip and attributes list_attr_celeba.txt in this drive:

img_align_celeba.zip : https://drive.google.com/drive/folders/0B7EVK8r0v71pTUZsaXdaSnZBZzg?resourcekey=0-rJlzl934LzC-Xp28GeIBzQ

list_attr_celeba.txt: https://drive.google.com/drive/folders/0B7EVK8r0v71pOC0wOVZlQnFfaGs?resourcekey=0-pEjrQoTrlbjZJO2UL8K_WQ

How to run

  1. Download the CelebA dataset
  2. Install the project :
git clone git@github.com:Tiphainell/CelebA-classification.git
cd CelebA-classification
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
  1. View cross-validation results in cross_val_result.ipynb
  2. If you want to launch the cross-validation, run cross_validation.py

Project Structure

Scripts

  • separation_train_val.py
    Creates labels and generates the dataset for train and test splits.

  • cross_validation.py
    Trains ResNet18 and the custom CNN with early stopping callbacks and saves the best model for each fold.

Notebooks

  • cross_val_result.ipynb
    Analyzes the learning curves of each model across the 5 folds.

Design Choices & Insights

  • HTER was chosen over accuracy to properly evaluate performance on imbalanced data. We also analyse ROC-AUC and balanced accuracy.
  • Stratified 5-fold cross-validation ensures fair evaluation across folds.
  • ResNet18 benefits from transfer learning, while the custom CNN serves as a lightweight baseline.

Results & Conclusions

Based on the cross-validation results:

  • ResNet18 outperforms the custom CNN for this specific classification task.
  • ResNet18 can achieve 0.1 HTER after only one epoch of training, whereas the custom CNN requires more epochs to converge.
  • The dataset’s class imbalance (13% minority class) is preserved across splits using stratified sampling.
Model Mean HTER Std
Custom CNN 0.14 0.006
ResNet18 0.10 0.005

Fold 1- Cross validation results img.png


Highlights

  • Binary classification with imbalanced data.
  • Comparison of a standard CNN architecture vs. ResNet18.
  • Stratified 5-fold cross-validation.
  • Learning curves and HTER evaluation.

Dataset & Citation

This project uses the CelebA dataset.

If you use this dataset, please cite:

Liu, Z., Luo, P., Wang, X., & Tang, X.
Deep Learning Face Attributes in the Wild.
Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.