Skip to content

gmatevos/Machine_Learning

Repository files navigation

LinkedIn

Image Test

Table of Contents

  1. About the Project
  2. Repository Contents
  3. Getting Started
  4. References

About the Project

During my Petroleum Engineering days I was always dealing with data analytics, but at some point I ran out of tools from my toolbox. I decided to learn data science on my own time and started with reading books referenced at the end. The biggest challenge for me was lack of structured examples that capture modeling process from start to finish 🤨.

I started organizing the material and putting the pieces of the puzzle together during the learning process 🤔. I'm hoping people that are facing the same challanges can utilize this repository to answer some of their own questions. Don't be afraid to start, the biggest hurdle is usually self-doubt.

Topics covered within this repo:

  1. EDA
  2. Preprocessing
  3. Feature selection / extraction
  4. Modeling
  5. Evaluations and improvements

P.S. There are sections with repetitive code that can be easily placed inside a function. Since repetition is the best path to learning (cross-validation would concur), this was done intentionally.

Repository Contents

Below is a brief description of what's in each folder. More detailed information can be found within each project.

Folders

  1. Craigslist Car Pricing - regression model that predicts car posting price.
  2. Credit Score Classification - classification of people with good/bad credit.
  3. Maunaloa Volcano CO2 Levels - time series forecasting. Sourced from A. Muller lectures
  4. US Population Income - classification model for prediction people making >$50k.
  5. Wine Ratings - predicting wine ratings from free text reviews.

Files

  1. ML Cheat Sheet - summary of models, theory and assumptions.

Getting Started

Datasets are not attached, but could be downloaded by following links mentioned within each project. Below are libraries and IDE used in each project.

Prerequsites

IDE

Libraries

  • Sklearn
  • Numpy
  • Matplotlib
  • Pandas
  • Scipy
  • Statsmodels
  • Category_encoders
  • Seaborn
  • Math
  • Skopt

Libraries can be installed with pip through terminal/command line

pip install numpy

If all mentioned packages are installed, you don't need anything else, time to download the notebooks and get going! 🚀

References

I've referenced the following books to learn statistics, python and ML algorithms. They should suffice to get started with ML, with rare supplemental Google searches.

  1. Probability & Statistics for Engineers & Scientists
  2. Learning Python
  3. Introduction to Computation and Programming Using Python
  4. Applied Predictive Modeling
  5. Introduction to Machine Learning with Python, A Guide for Data Scientists
  6. Andreas Muller Columbia Data Science Institute Lecture Series

About

This repository contains classification and regression ML examples

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published