DiabesityDetect

This project used machine learning to predict diabetes from the NIDDK dataset. After exploring the data, Logistic Regression and Support Vector Machines proved most effective, underlining data analysis's importance in healthcare. We also built a Tableau dashboard to summarize our results.

Data

The dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. It contains 768 observations with 9 features including:

Pregnancies - Number of times pregnant
Glucose - Plasma glucose concentration
BloodPressure - Diastolic blood pressure
SkinThickness - Triceps skinfold thickness
Insulin - Two-hour serum insulin
BMI - Body mass index
DiabetesPedigreeFunction - Diabetes pedigree function
Age - Age in years
Outcome - Class variable (0 or 1)

The dataset is included in this repository as healthcare.csv.

Methods

The analysis involved the following steps:

Exploratory data analysis using Pandas, Matplotlib, and Seaborn to understand features.
Handling missing values by imputing median values.
Splitting data into training and test sets.
Applying scaling to features.
Building classification models like Logistic Regression, SVM, KNN, Random Forest, and Decision Tree using Scikit-learn.
Evaluating models using Accuracy, Precision, Recall, F1-Score, AUC-ROC, etc.

Results

Logistic Regression achieved the best AUC-ROC of 87%, followed by SVM.
The most important features were Glucose, BMI, Age.
Models achieved 77-78% accuracy on the test data.
Random Forest overfit on training data compared to other models.

Key visualizations and metrics are compiled in the Jupyter notebook.

Usage

The main analysis is contained in diabetes_prediction.ipynb. Other notebooks contain exploratory analysis.

The trained models can be loaded and used for predicting on new data.

Requirements

Python 3.x
Pandas, Numpy
Matplotlib, Seaborn
Scikit-learn

Contact

For any questions, please reach out to me at 419vive@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Capstone 2.ipynb		Capstone 2.ipynb
Healthcare Case Study.docx		Healthcare Case Study.docx
README.md		README.md
correlation matrix.csv		correlation matrix.csv
health care diabetes.csv		health care diabetes.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiabesityDetect

Data

Methods

Results

Usage

Requirements

Contact

About

Uh oh!

Releases

Packages

Languages

419vive/DiabesityDetect

Folders and files

Latest commit

History

Repository files navigation

DiabesityDetect

Data

Methods

Results

Usage

Requirements

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages