Imbalanced Face Attribute Classification

This project was originally developed as part of a technical assessment for a Computer Vision / Machine Learning Engineer position and was later refined as a portfolio project.

It addresses a binary face classification task with highly imbalanced classes using the CelebA dataset. The performance of ResNet18 and a custom CNN is evaluated using 5-fold stratified cross-validation, with HTER as the main evaluation metric.

The positive (minority) class corresponds to samples exhibiting at least one of the following attributes: Hat, Eyeglasses, or Mustache.

Dataset

This project uses the CelebA data set, a large-scale face attributes with more than 200K celebrity images each with 40 attribute annotations. You can download images in img_align_celeba.zip and attributes list_attr_celeba.txt in this drive:

img_align_celeba.zip : https://drive.google.com/drive/folders/0B7EVK8r0v71pTUZsaXdaSnZBZzg?resourcekey=0-rJlzl934LzC-Xp28GeIBzQ

list_attr_celeba.txt: https://drive.google.com/drive/folders/0B7EVK8r0v71pOC0wOVZlQnFfaGs?resourcekey=0-pEjrQoTrlbjZJO2UL8K_WQ

How to run

Download the CelebA dataset
Install the project :

git clone git@github.com:Tiphainell/CelebA-classification.git
cd CelebA-classification
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

View cross-validation results in cross_val_result.ipynb
If you want to launch the cross-validation, run cross_validation.py

Project Structure

Scripts

separation_train_val.py
Creates labels and generates the dataset for train and test splits.
cross_validation.py
Trains ResNet18 and the custom CNN with early stopping callbacks and saves the best model for each fold.

Notebooks

cross_val_result.ipynb
Analyzes the learning curves of each model across the 5 folds.

Design Choices & Insights

HTER was chosen over accuracy to properly evaluate performance on imbalanced data. We also analyse ROC-AUC and balanced accuracy.
Stratified 5-fold cross-validation ensures fair evaluation across folds.
ResNet18 benefits from transfer learning, while the custom CNN serves as a lightweight baseline.

Results & Conclusions

Based on the cross-validation results:

ResNet18 outperforms the custom CNN for this specific classification task.
ResNet18 can achieve 0.1 HTER after only one epoch of training, whereas the custom CNN requires more epochs to converge.
The dataset’s class imbalance (13% minority class) is preserved across splits using stratified sampling.

Model	Mean HTER	Std
Custom CNN	0.14	0.006
ResNet18	0.10	0.005

Fold 1- Cross validation results

Highlights

Binary classification with imbalanced data.
Comparison of a standard CNN architecture vs. ResNet18.
Stratified 5-fold cross-validation.
Learning curves and HTER evaluation.

Dataset & Citation

This project uses the CelebA dataset.

If you use this dataset, please cite:

Liu, Z., Luo, P., Wang, X., & Tang, X.
Deep Learning Face Attributes in the Wild.
Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Notebooks		Notebooks
Script		Script
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
img.png		img.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Imbalanced Face Attribute Classification

Dataset

How to run

Project Structure

Scripts

Notebooks

Design Choices & Insights

Results & Conclusions

Highlights

Dataset & Citation

About

Uh oh!

Releases

Packages

Languages

Tiphainell/CelebA-classification

Folders and files

Latest commit

History

Repository files navigation

Imbalanced Face Attribute Classification

Dataset

How to run

Project Structure

Scripts

Notebooks

Design Choices & Insights

Results & Conclusions

Highlights

Dataset & Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages