Benchmark multiple models for simple Sentiment analysis task with IMDB dataset
This project benchmarks Logistic Regression, CNN, DistilBERT, and BERT for sentiment analysis on the IMDB dataset (51.2% negative, 48.8% positive). It evaluates performance (accuracy, F1 score) and compute efficiency (training/inference time, CPU/GPU memory, CO2 emissions) using Hugging Face, scikit-learn, PyTorch, and Weights & Biases (W&B). A Gradio app provides interactive predictions with training (accuracy, training time) and evaluation (F1 score, inference time) metrics.
- Models: Logistic Regression, multi-kernel CNN, DistilBERT, BERT.
- Metrics: Accuracy, F1 score, training/inference time, CPU/GPU memory, CO2 emissions.
- Tools: W&B for logging, CodeCarbon for emissions, Gradio for demo.
- Dataset: IMDB (subset of 1000 samples for efficiency).
- Run Sentiment_Analysis_Benchmarking.ipynb in a Jupyter environment (e.g., Colab).
- View metrics/plots in W&B dashboard (sentiment_benchmarking project).
- Test predictions via Gradio demo with sample inputs (e.g., "This movie was fantastic!").
- Check emissions in model-specific CSV files (e.g., emissions_Logistic_Regression.csv).
- Python 3.10+
- Libraries: transformers, datasets, scikit-learn, torch, psutil, pynvml, codecarbon, pandas, matplotlib, seaborn, gradio, wandb
- Optional: GPU for faster training
