Machine Learning Implementations

English | 中文

This repository contains implementations of various machine learning algorithms from scratch, along with comparisons to their sklearn counterparts.

Basic

This section includes basic libraries such as Numpy for numerical computations, Pandas for data manipulation, and Matplotlib for data visualization. All the details can be found in the basic folder.

Linear Regression

The LinearRegression class implements the ordinary least squares method for linear regression. It includes methods for fitting the model to data and making predictions.

Using the dataset from Kaggle Dataset Link, we preprocess the data and train both our custom linear regression model and sklearn's implementation for comparison.

Method	Accuracy
Custom Linear Regression	98.8801%
Sklearn Linear Regression	98.8801%

Logistic Regression

The CustomLogisticRegression class implements logistic regression using gradient descent. It supports fitting to data, making predictions, and evaluating accuracy.

Using the dataset from Kaggle Dataset Link, we preprocess the data and train both our custom logistic regression model and sklearn's implementation for comparison.

Method	Accuracy
Custom Logistic Regression	85.85%
Sklearn Logistic Regression	86.08%

Decision Tree Classifier (CART)

The DecisionTreeCART class implements the CART algorithm for decision trees. It supports fitting to data, making predictions, and evaluating accuracy.

Using the dataset from Kaggle Dataset Link, we preprocess the data and train both our custom decision tree and sklearn's implementation for comparison.

Method	Accuracy
Custom CART	34.07%
Sklearn CART	34.11%

Random Forest Classifier

The CustomRandomForest class implements the Random Forest algorithm using multiple decision trees. It supports fitting to data, making predictions, and evaluating accuracy. The base decision tree used is our custom DecisionTreeCART.

Using the dataset from Kaggle Dataset Link, we preprocess the data and train both our custom random forest and sklearn's implementation for comparison.

Method	Accuracy
Custom Random Forest	86.80%
Sklearn Random Forest	86.10%

It is surprising that the custom implementation outperforms sklearn in this case.

K-Means Clustering

The KMeans class implements the K-Means clustering algorithm. It includes methods for fitting the model to data and predicting cluster assignments.

Using the dataset from Kaggle Dataset Link, we preprocess the data and train both our custom K-Means model and sklearn's implementation for comparison.

Scatter plots are generated to visualize the clustering results.

Custom K-Means Clustering Result:

Sklearn K-Means Clustering Result:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
basic		basic
dataset		dataset
desiciontree		desiciontree
docs		docs
kmeans		kmeans
linearregression		linearregression
logisticregression		logisticregression
randomforest		randomforest
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning Implementations

Contents

Basic

Supervised Learning

Unsupervised Learning

Basic

Linear Regression

Logistic Regression

Decision Tree Classifier (CART)

Random Forest Classifier

K-Means Clustering

About

Uh oh!

Releases

Packages

Languages

BAIGUANGMEI/Machine-Learning-Implementations

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Implementations

Contents

Basic

Supervised Learning

Unsupervised Learning

Basic

Linear Regression

Logistic Regression

Decision Tree Classifier (CART)

Random Forest Classifier

K-Means Clustering

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages