English | 中文
This repository contains implementations of various machine learning algorithms from scratch, along with comparisons to their sklearn counterparts.
This section includes basic libraries such as Numpy for numerical computations, Pandas for data manipulation, and Matplotlib for data visualization. All the details can be found in the basic folder.
The LinearRegression class implements the ordinary least squares method for linear regression. It includes methods for fitting the model to data and making predictions.
Using the dataset from Kaggle Dataset Link, we preprocess the data and train both our custom linear regression model and sklearn's implementation for comparison.
| Method | Accuracy |
|---|---|
| Custom Linear Regression | 98.8801% |
| Sklearn Linear Regression | 98.8801% |
The CustomLogisticRegression class implements logistic regression using gradient descent. It supports fitting to data, making predictions, and evaluating accuracy.
Using the dataset from Kaggle Dataset Link, we preprocess the data and train both our custom logistic regression model and sklearn's implementation for comparison.
| Method | Accuracy |
|---|---|
| Custom Logistic Regression | 85.85% |
| Sklearn Logistic Regression | 86.08% |
The DecisionTreeCART class implements the CART algorithm for decision trees. It supports fitting to data, making predictions, and evaluating accuracy.
Using the dataset from Kaggle Dataset Link, we preprocess the data and train both our custom decision tree and sklearn's implementation for comparison.
| Method | Accuracy |
|---|---|
| Custom CART | 34.07% |
| Sklearn CART | 34.11% |
The CustomRandomForest class implements the Random Forest algorithm using multiple decision trees. It supports fitting to data, making predictions, and evaluating accuracy. The base decision tree used is our custom DecisionTreeCART.
Using the dataset from Kaggle Dataset Link, we preprocess the data and train both our custom random forest and sklearn's implementation for comparison.
| Method | Accuracy |
|---|---|
| Custom Random Forest | 86.80% |
| Sklearn Random Forest | 86.10% |
It is surprising that the custom implementation outperforms sklearn in this case.
The KMeans class implements the K-Means clustering algorithm. It includes methods for fitting the model to data and predicting cluster assignments.
Using the dataset from Kaggle Dataset Link, we preprocess the data and train both our custom K-Means model and sklearn's implementation for comparison.
Scatter plots are generated to visualize the clustering results.

