CSE151A-Project --- Beer Review Prediction

Requirements

pip install -r requirement.txt

Introduction

In this project, we predict the overall score of the beer based on Alcohol By Volume (ABV), appearance, aroma, palate, and taste using different machine-learning models. We will begin by building up from a simple linear regression model to establish a reasonable first-pass performance baseline. Later, we used Deep Neural Network (DNN) extensively to predict overall rating and style of beers. Aside from standard analysis we learned, we also explored the usage of PyTorch and XG Boost in Deep Learning. This repository documents our methodology, findings, and insights, providing a foundation for future research and improvements in predictive modeling for beer ratings, which could be commercialized to recommend certain types of beer to people who prefer them and offer manufactures suggestions in improving beer qualities.

Data Exploration

Data source: rate_beer
Click on second (RateBeer) link to download ratebeer.json.gz

The dataset contains the following columns:

beer/beerId: Unique identifier for each beer.
beer/ABV: Alcohol by volume of the beer.
beer/style: Style of the beer. (string)
review/appearance: Appearance review score (out of 5).
review/aroma: Aroma review score (out of 10).
review/palate: Palate review score (out of 5).
review/taste: Taste review score (out of 10).
review/overall: Overall review score (out of 20).

Data Description

Number of observations: 2785525
Data distributions: left skewed
Missing data: There is no missing data in the provided dataset
Column descriptions: as showed aboved

Data Visualization

Out dataset is the tabular data, so scatter plots will be used to visualize the relationships between different features and the target variable (average review of the overall score).

General workflow

Read data from downloaded ratebeer.json.gz

Preprocessing Steps

Scaling numeric features.
Encoding categorical features.
Splitting the dataset into training and testing sets.

Raw data can be found here: raw_data.csv

Cleaned data set can be found here cleaned_numeric_data.csv

Linear Regression

Use simple linear regression from sklearn.linear_model.LinearRegression to predict overall rating

DNN prediction of overall rating

Use keras.layers.Dense() to create DNN model to predict overall rating using all other numeric features (ratings).

DNN prediction of beer style

Use keras.layers.Dense() to create DNN model to predict style of beer using all numeric data.

Running the Project

Clone the repository:

git clone https://github.com/jil258/CSE151A-Project.git

Download the data from google drive. (this step is included in the script) Preprocess and explore the data.
Running Linear Regression to predict overall score using all 5 other numeric features.

Files

Uncleaned and processed raw data loaded from ratebeer.json.gz dropping some not interesting columns (downloaded from source website) raw_data.csv
Cleaned data set can be found here cleaned_numeric_data.csv
MainModel.ipynb The main file that incorporate our core workflow and model.
DNN.ipynb Deep Neural Network prediction on overall rating using PyTorch (for extra exploration).
NN_StylePrediction.ipynb Deep Neural Network to predict style of beer using other numeric features (multi-class classification)
XGboost.ipynb Further enhancement of DNN.ipynb (Further exploration)
Conclusion_section.md Conclusion section from Milestone3

Written Report

See this for detailed Methods, Results, Discussion, and Conclusion.

Final Report

Contribution

Yang Han: Did data preprocessing and exploration, writing some part of report, improving codes, and proofreading.
Leonard Shi: Help to find the dataset and abstraction. Write the Read-Me and the written report.
Jian-Peng Li: Complete XGBoost training notebook, implement grid search. Complete DNN notebook, complete corresponding section on latex file.
Wen Hsin Chang: Train our first model, written report
Zhaoyu Dou: Train our first model and second model.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
CSE151A_Final_Report.pdf		CSE151A_Final_Report.pdf
Conclusion section.md		Conclusion section.md
DNN.ipynb		DNN.ipynb
MainModel.ipynb		MainModel.ipynb
NN_StylePrediction.ipynb		NN_StylePrediction.ipynb
README.md		README.md
XGboost.ipynb		XGboost.ipynb
cleaned_numeric_data.csv		cleaned_numeric_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSE151A-Project --- Beer Review Prediction

Requirements

Introduction

Data Exploration

Data Description

Data Visualization

General workflow

Read data from downloaded ratebeer.json.gz

Preprocessing Steps

Linear Regression

DNN prediction of overall rating

DNN prediction of beer style

Running the Project

Files

Written Report

See this for detailed Methods, Results, Discussion, and Conclusion.

Contribution

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

jil258/CSE151A-Project

Folders and files

Latest commit

History

Repository files navigation

CSE151A-Project --- Beer Review Prediction

Requirements

Introduction

Data Exploration

Data Description

Data Visualization

General workflow

Read data from downloaded ratebeer.json.gz

Preprocessing Steps

Linear Regression

DNN prediction of overall rating

DNN prediction of beer style

Running the Project

Files

Written Report

See this for detailed Methods, Results, Discussion, and Conclusion.

Contribution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages