Lyric Sentiment Analysis with Fine-Tuned BERT

In this project, I explored the emotional undertones in song lyrics for my musical project flowrwatr using advanced natural language processing (NLP) techniques. A fine-tuned BERT model (bert-tiny) was employed to classify lyrics into sentiment categories. This analysis integrates preprocessing workflows, sentiment labeling, training, and validation pipelines while extracting meaningful insights like word frequencies from the lyrics.

Authors

@jeimcg

Overview

Goal

To build a comprehensive pipeline for processing song lyrics, assigning sentiment labels, and fine-tuning a pre-trained BERT model for sentiment classification. Word frequency analysis is also conducted to deepen understanding of lyrical content.

Workflow

Data Cleaning and Preprocessing:
- Remove noise (special characters, punctuation).
- Normalize text (lowercase, tokenization).
Sentiment Labeling:
- Assign sentiment labels using TextBlob and save labeled datasets.
Model Training:
- Fine-tune a bert-tiny model on labeled data using training, validation, and test splits.
Performance Evaluation:
- Validate the model with training, validation, and test metrics.
Word Frequency Analysis:
- Analyze frequently used words to uncover linguistic trends.

Features

Sentiment Analysis: Identifies emotional tones in lyrics using a fine-tuned BERT model.
Data Preparation: Automates text cleaning and sentiment labeling for large datasets.
Word Frequency Analysis: Highlights common words in lyrics for further insights.
Training and Validation: Includes performance metrics to evaluate the model.

Findings

Training and validation losses stabilized over 3 epochs:
- Final Training Loss: 0.0857
- Final Validation Loss: 0.0810
- Test Loss: 0.1074

Sentiment Predictions:

'label': 'LABEL_0', 'score': 0.6137}, {'label': 'LABEL_0', 'score': 0.5820

Features

Sentiment Analysis: Identifies emotional tones in lyrics using a fine-tuned BERT model.
Data Preparation: Automates text cleaning and sentiment labeling for large datasets.
Word Frequency Analysis: Highlights common words in lyrics for further insights.
Training and Validation: Includes performance metrics to evaluate the model.

Setup

Clone the Repository

git clone https://github.com/jeimcg/sentiment-analysis-bert
cd sentiment-analysis-bert

Install Dependencies
```
pip install -r requirements.txt
```
Run the Colab Notebook
- Open bert_end_to_end_sentiment_pipeline.ipynb in Google Colab
- Upload lyrics_with_labels.xlsx dataset into environment
- Follow notebook steps to clean/preprocess, train/validate model, and analyze results

Acknowledgements

Thanks to:

Hugging Face Transformers for pre-trained models.
TextBlob for sentiment labeling.
Open-source communities for their tools and documentation.
Special appreciation for self-learning, persistence, and curiosity in data science and machine learning at age 24! 😊

Future Improvements

Expand dataset diversity for improved generalization.
Use a larger pre-trained BERT model for enhanced accuracy.
Add data augmentation for better handling of rare labels.
Deploy the model as an API or integrate into a web application.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.jupyter/desktop-workspaces		.jupyter/desktop-workspaces
data		data
docs		docs
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lyric Sentiment Analysis with Fine-Tuned BERT

Authors

Overview

Goal

Workflow

Features

Findings

Features

Setup

Acknowledgements

Future Improvements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

jeimcg/sentiment-analysis-bert

Folders and files

Latest commit

History

Repository files navigation

Lyric Sentiment Analysis with Fine-Tuned BERT

Authors

Overview

Goal

Workflow

Features

Findings

Features

Setup

Acknowledgements

Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages