This repository showcases a machine learning portfolio focused on implementing core ML algorithms from scratch and benchmarking them against standard machine learning libraries.
The goal is to develop a strong understanding of model internals, optimization techniques, and evaluation rather than relying solely on high-level APIs.
- Implement fundamental machine learning algorithms from first principles
- Understand optimization, loss functions, and convergence behavior
- Compare custom implementations with scikit-learn models
- Evaluate models using appropriate metrics
- Build clean, reproducible, and well-documented ML code
- Linear Regression (Gradient Descent, Mean Squared Error)
- Logistic Regression (Sigmoid)
- K-Means Clustering (Distance-based clustering)
- Evaluation Metrics (Accuracy, Precision, Recall, RMSE)
- Linear Regression – scikit-learn
- Logistic Regression – scikit-learn
- K-Means – scikit-learn
- Model evaluation and comparison
ML-Algorithms/
│
├── Linear-Regression/
│ ├── linear-regression-from-scratch.ipynb
│ └── linear-regression.ipynb
│
├── classification/
│ ├── logistic_regression_from_scratch.py
│ └── logistic_regression_sklearn.py
│
├── clustering/
│ ├── kmeans_from_scratch.py
│ └── kmeans_sklearn.py
│
├── data/
│ └── sample_datasets.csv
│
├── README.md
├── requirements.txt
└── .gitignore
---
Each algorithm follows a consistent ML workflow:
- Data preprocessing and feature scaling
- Algorithm implementation from scratch
- Model training using iterative optimization
- Performance evaluation using suitable metrics
- Comparison with library-based implementations
- Visualization and error analysis
- Custom implementations achieve performance comparable to scikit-learn baselines on benchmark datasets
- Loss curves and visualizations help analyze convergence behavior
- Error analysis highlights strengths and limitations of each model
- Language: Python
- Libraries: NumPy, Pandas, Matplotlib, scikit-learn
- Tools: Jupyter Notebook, Git, GitHub
This project emphasizes fundamental understanding of machine learning algorithms.
Libraries are used intentionally after validating concepts through from-scratch implementations.