VideoGameSalesPredictor

Machine learning model that attempts predicting sales based on descriptors and ratings.

In this notebook, I am aiming to create a machine learning model that predicts sales of video games within a reasonable accuracy using some descriptors along with critic and user data.

The data used in this notebook was found on Kaggle, at: https://www.kaggle.com/kendallgillies/video-game-sales-and-ratings

There is only critic and user data for about half of the available video game data, but this still leaves a lot to work with.

The models did best when attemping to predict the global sales. There are a lot outliers in the sales data and the distribution are very non-normal. Droping the outliers was tested by dropping outside 90th to 99.9th percentiles. This may have improved the models from a linear regression standpoint, but is more harmful, I think, from the categorical contributions. The sales data is log-transformed to help gather it into a managable range, and from there, the correlations with the critic and user data are a little more visible. Multicolinearlity is addressed using VIF. Scaling is performed to the numeric data prior to fiting the models. The categorical data that I found to be the most useful was reduced to 50 unique values by re-bagging, or renaming the least frequent elements to "other".

Tools Used:

From scikit-learn I used XGBoost Regressor, Lasso and ElasticNet Regressors, and a KNN Regressor.
- Preprocessing and organization is handled with StandardScaler, and ColumnTransformer and Pipeline.
- GridSeachCV is used for iteration and parameter tuning.
Seaborn is the main library used for visuals.
Pandas and Numpy handle my data frames and computation.

Future Goals:

I believe that experimenting futher with SVM, or maybe neural nets might be fruitfull.
Perhaps these model would work better on subsets of the data (i.e. just 'Playstation' or "XBox")
Maybe there is a subset of critic and user data that make for modeling better predicators.
Using other methods for dimension reduction and primary component anylsis could be implemented.
Seeking other data sources to merge with this set.
Find other ways to prepare data, futher tune models, implement other models, or gather other insights.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
VideoGameSalesPredictor_PP.pptx		VideoGameSalesPredictor_PP.pptx
Video_Game_Sales_Predictor.ipynb		Video_Game_Sales_Predictor.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoGameSalesPredictor

About

Uh oh!

Releases

Packages

Languages

License

Ninjaneer1/VideoGameSalesPredictor

Folders and files

Latest commit

History

Repository files navigation

VideoGameSalesPredictor

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages