Sentiment Analysis with Logistic Regression

Project Overview

This project involves building a logistic regression model from scratch to classify movie reviews as either positive (1) or negative (0) using word-level sentiment scores from the VADER Sentiment Analysis tool.

Goal

The objective is to implement logistic regression with gradient descent to accurately predict sentiment based on the dataset provided.

Dataset & Preprocessing

Dataset: A CSV file (train.csv) containing movie reviews and their corresponding sentiment labels.
Data Cleaning:
- Drop rows with Neutral Sentiment.
- Convert text into numerical features using VADER's word-level sentiment scores.
- Replace each word with its corresponding sentiment score from the VADER lexicon.
- If a word is not found in the VADER lexicon, assign a score of 0.
- Ensure all embeddings have the same length by padding shorter sequences with zeros.

Example Transformation

Input Sentence:

"I really love this amazing product!"

Numerical Representation:

[0, 0.1779, 3.2, 0, 2.9, 0]

After Padding:

[0, 0.1779, 3.2, 0, 2.9, 0, 0, 0]

Model Implementation

Implement logistic regression from scratch.
Use gradient descent for optimization.
Do not use pre-built deep learning models.
Utilize libraries like pandas, numpy, and matplotlib for data processing and visualization.

Training Strategy

Split train.csv into training (80%) and validation (20%) sets.
Use a classification threshold of 0.5:
- Predicted Probability ≥ 0.5 → Classify as 1 (Positive).
- Predicted Probability < 0.5 → Classify as 0 (Negative).
Use gradient descent to optimize model weights.
Evaluate model performance using accuracy score.

Model Evaluation

Metric Used: Accuracy Score

Formula:

Accuracy = (Number of Correct Predictions) / (Total Predictions)

Example Code for VADER Sentiment Analysis

import nltk
from nltk.sentiment import SentimentIntensityAnalyzer

# Download VADER lexicon
nltk.download('vader_lexicon')

# Initialize VADER
analyzer = SentimentIntensityAnalyzer()

# Access the VADER lexicon
vaderLex = analyzer.lexicon

# Example
print(vaderLex.get("happy"))  # Output: 2.7

Submission Format

The final output must be saved as a CSV file with two columns:

id	Vader_Binary_Sentiment
0	0
1	1
2	1
3	0

Ensure your submission file matches this format to be correctly evaluated.

How to Run the Project

Install required dependencies:

pip install numpy pandas nltk matplotlib scikit-learn

Run the preprocessing script to clean the dataset.
Train the logistic regression model using gradient descent.
Evaluate the model using accuracy score.
Generate predictions on the test dataset.
Save predictions as submission.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
README.md		README.md
logistic_regression.ipynb		logistic_regression.ipynb
model.pkl		model.pkl
processed_test.csv		processed_test.csv
processed_train.csv		processed_train.csv
sub.csv		sub.csv
submission.csv		submission.csv
submission_file.csv		submission_file.csv
test.csv		test.csv
test.ipynb		test.ipynb
testtttt.csv		testtttt.csv
train.csv		train.csv
ytyyy.ipynb		ytyyy.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sentiment Analysis with Logistic Regression

Project Overview

Goal

Dataset & Preprocessing

Example Transformation

Input Sentence:

Numerical Representation:

After Padding:

Model Implementation

Training Strategy

Model Evaluation

Example Code for VADER Sentiment Analysis

Submission Format

How to Run the Project

Tags

About

Uh oh!

Releases

Packages

Languages

License

Sou7ai1/Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis with Logistic Regression

Project Overview

Goal

Dataset & Preprocessing

Example Transformation

Input Sentence:

Numerical Representation:

After Padding:

Model Implementation

Training Strategy

Model Evaluation

Example Code for VADER Sentiment Analysis

Submission Format

How to Run the Project

Tags

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages