In this project, I explored the emotional undertones in song lyrics for my musical project flowrwatr using advanced natural language processing (NLP) techniques. A fine-tuned BERT model (bert-tiny) was employed to classify lyrics into sentiment categories. This analysis integrates preprocessing workflows, sentiment labeling, training, and validation pipelines while extracting meaningful insights like word frequencies from the lyrics.
To build a comprehensive pipeline for processing song lyrics, assigning sentiment labels, and fine-tuning a pre-trained BERT model for sentiment classification. Word frequency analysis is also conducted to deepen understanding of lyrical content.
- Data Cleaning and Preprocessing:
- Remove noise (special characters, punctuation).
- Normalize text (lowercase, tokenization).
- Sentiment Labeling:
- Assign sentiment labels using
TextBloband save labeled datasets.
- Assign sentiment labels using
- Model Training:
- Fine-tune a
bert-tinymodel on labeled data using training, validation, and test splits.
- Fine-tune a
- Performance Evaluation:
- Validate the model with training, validation, and test metrics.
- Word Frequency Analysis:
- Analyze frequently used words to uncover linguistic trends.
- Sentiment Analysis: Identifies emotional tones in lyrics using a fine-tuned BERT model.
- Data Preparation: Automates text cleaning and sentiment labeling for large datasets.
- Word Frequency Analysis: Highlights common words in lyrics for further insights.
- Training and Validation: Includes performance metrics to evaluate the model.
- Training and validation losses stabilized over 3 epochs:
- Final Training Loss: 0.0857
- Final Validation Loss: 0.0810
- Test Loss: 0.1074
- Sentiment Predictions:
'label': 'LABEL_0', 'score': 0.6137}, {'label': 'LABEL_0', 'score': 0.5820
- Sentiment Analysis: Identifies emotional tones in lyrics using a fine-tuned BERT model.
- Data Preparation: Automates text cleaning and sentiment labeling for large datasets.
- Word Frequency Analysis: Highlights common words in lyrics for further insights.
- Training and Validation: Includes performance metrics to evaluate the model.
- Clone the Repository
git clone https://github.com/jeimcg/sentiment-analysis-bert cd sentiment-analysis-bert - Install Dependencies
pip install -r requirements.txt
- Run the Colab Notebook
- Open bert_end_to_end_sentiment_pipeline.ipynb in Google Colab
- Upload lyrics_with_labels.xlsx dataset into environment
- Follow notebook steps to clean/preprocess, train/validate model, and analyze results
Thanks to:
- Hugging Face Transformers for pre-trained models.
- TextBlob for sentiment labeling.
- Open-source communities for their tools and documentation.
- Special appreciation for self-learning, persistence, and curiosity in data science and machine learning at age 24! 😊
- Expand dataset diversity for improved generalization.
- Use a larger pre-trained BERT model for enhanced accuracy.
- Add data augmentation for better handling of rare labels.
- Deploy the model as an API or integrate into a web application.