Skip to content

a bit (a lot) of lexicon + content aware sentiment analysis on song lyrics for one of my musical projects, flowrwatr!

Notifications You must be signed in to change notification settings

jeimcg/sentiment-analysis-bert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lyric Sentiment Analysis with Fine-Tuned BERT

In this project, I explored the emotional undertones in song lyrics for my musical project flowrwatr using advanced natural language processing (NLP) techniques. A fine-tuned BERT model (bert-tiny) was employed to classify lyrics into sentiment categories. This analysis integrates preprocessing workflows, sentiment labeling, training, and validation pipelines while extracting meaningful insights like word frequencies from the lyrics.


Authors


Overview

Goal

To build a comprehensive pipeline for processing song lyrics, assigning sentiment labels, and fine-tuning a pre-trained BERT model for sentiment classification. Word frequency analysis is also conducted to deepen understanding of lyrical content.

Workflow

  1. Data Cleaning and Preprocessing:
    • Remove noise (special characters, punctuation).
    • Normalize text (lowercase, tokenization).
  2. Sentiment Labeling:
    • Assign sentiment labels using TextBlob and save labeled datasets.
  3. Model Training:
    • Fine-tune a bert-tiny model on labeled data using training, validation, and test splits.
  4. Performance Evaluation:
    • Validate the model with training, validation, and test metrics.
  5. Word Frequency Analysis:
    • Analyze frequently used words to uncover linguistic trends.

Features

  • Sentiment Analysis: Identifies emotional tones in lyrics using a fine-tuned BERT model.
  • Data Preparation: Automates text cleaning and sentiment labeling for large datasets.
  • Word Frequency Analysis: Highlights common words in lyrics for further insights.
  • Training and Validation: Includes performance metrics to evaluate the model.

Findings

  • Training and validation losses stabilized over 3 epochs:
    • Final Training Loss: 0.0857
    • Final Validation Loss: 0.0810
    • Test Loss: 0.1074
  • Sentiment Predictions:
    'label': 'LABEL_0', 'score': 0.6137}, {'label': 'LABEL_0', 'score': 0.5820

Features

  • Sentiment Analysis: Identifies emotional tones in lyrics using a fine-tuned BERT model.
  • Data Preparation: Automates text cleaning and sentiment labeling for large datasets.
  • Word Frequency Analysis: Highlights common words in lyrics for further insights.
  • Training and Validation: Includes performance metrics to evaluate the model.

Setup

  • Clone the Repository
    git clone https://github.com/jeimcg/sentiment-analysis-bert
    cd sentiment-analysis-bert
  • Install Dependencies
    pip install -r requirements.txt
  • Run the Colab Notebook
    • Open bert_end_to_end_sentiment_pipeline.ipynb in Google Colab
    • Upload lyrics_with_labels.xlsx dataset into environment
    • Follow notebook steps to clean/preprocess, train/validate model, and analyze results

Acknowledgements

Thanks to:

  • Hugging Face Transformers for pre-trained models.
  • TextBlob for sentiment labeling.
  • Open-source communities for their tools and documentation.
  • Special appreciation for self-learning, persistence, and curiosity in data science and machine learning at age 24! 😊

Future Improvements

  • Expand dataset diversity for improved generalization.
  • Use a larger pre-trained BERT model for enhanced accuracy.
  • Add data augmentation for better handling of rare labels.
  • Deploy the model as an API or integrate into a web application.

About

a bit (a lot) of lexicon + content aware sentiment analysis on song lyrics for one of my musical projects, flowrwatr!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •