Machine Learning Course Projects

A comprehensive collection of machine learning implementations covering fundamental to advanced topics in data analysis, visualization, classical ML, deep learning, and generative models.

Requirements

All required Python dependencies are listed in requirements.txt.

1️⃣ Create and activate a virtual environment (recommended)

python -m venv venv source venv/bin/activate # Linux / macOS venv\Scripts\activate # Windows

(If using Conda, activate your environment instead.)

2️⃣ Install dependencies (CPU version)

pip install -r requirements.txt This installs all libraries including PyTorch (CPU-only).

3️⃣ (Optional) GPU support for PyTorch (CUDA)

If you have an NVIDIA GPU, install PyTorch with CUDA support before installing the remaining dependencies.

Visit the official PyTorch selector: https://pytorch.org/get-started/locally/

Example (CUDA 12.1):

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 pip install -r requirements.txt

Make sure your CUDA version matches your installed NVIDIA drivers.

01 - Dataset Analysis Visualization

Generate and analyze 10,000 synthetic student records with attributes (gender, major, program, GPA)
Create comprehensive visualizations including distributions, conditional analyses, and pairplots
Implement and compare simple vs stratified sampling techniques
Perform specialized sampling strategies: gender-balanced, GPA-uniform, and program-major balanced cohorts

02 - K-Nearest Neighbors

Build a KNN classifier from scratch to predict student gender based on features
Implement custom PerFeatureTransformer class for handling categorical and numerical features
Analyze performance across different k values (1-21) and distance metrics (Euclidean, Manhattan, Cosine)
Evaluate single-feature vs multi-feature predictions using F1-scores and accuracy metrics

03 - Linear Regression

Implement polynomial regression with L1 and L2 regularization for GPA prediction
Compare models across polynomial degrees 1-6 with varying regularization strengths
Identify optimal hyperparameters using validation set performance
Analyze feature importance and interpret non-zero weights from regularized models

04 - K-Means Clustering

Implement K-Means clustering algorithm from scratch with fit(), predict(), and getCost() methods
Determine optimal number of clusters using Elbow Method (WCSS) and Silhouette Score
Evaluate cluster quality and interpret clustering results
Compare custom implementation with sklearn's built-in K-Means

05 - Gaussian Mixture Models

Build GMM class from scratch implementing the Expectation-Maximization (EM) algorithm
Implement fit(), getMembership(), and getLikelihood() methods
Determine optimal clusters using BIC (Bayesian Information Criterion) and Silhouette Method
Visualize likelihood convergence across iterations

06 - Image Segmentation

Perform color-based image segmentation on satellite images using custom GMM implementation
Segment images into three distinct regions (Land, Water, Vegetation) using k=3 Gaussian components
Create dynamic visualization videos showing EM algorithm convergence frame-by-frame
Display original image, segmented output, and log-likelihood plots side-by-side in video format

07 - Principle Component Analysis

Implement PCA from scratch with fit(), transform(), and checkPCA() methods
Apply dimensionality reduction to MNIST dataset (28×28 pixels) reducing to 500, 300, 150, and 30 dimensions
Analyze explained variance vs number of principal components
Visualize reconstruction quality by projecting reduced dimensions back to original space

08 - PCA Classification

Investigate effect of PCA dimensionality reduction on KNN classifier performance
Apply custom PCA to 8×8 MNIST digits dataset with varying component counts [2, 5, 10, 20, 30, 40, 50, 64]
Train KNN classifiers with different k values [5, 25, 50, 100] on reduced datasets
Analyze relationship between principal components, k values, and classification accuracy

09 - Data Transformation Clustering

Analyze modified MNIST dataset (1,000 altered images) to identify data transformations
Design and justify appropriate transformation techniques to address observed differences
Compare clustering performance before and after transformation using evaluation metrics
Explain why transformations improved clustering outcomes

10 - Neural Network From Scratch

Implement MLP from scratch including Linear layers, activation functions (ReLU, Tanh, Sigmoid, Identity)
Build complete training pipeline with forward/backward passes, gradient accumulation, and early stopping
Model complex Belgium-Netherlands border (Baarle-Nassau) using coordinate-based prediction
Achieve 91% accuracy through architecture optimization and hyperparameter tuning

11 - Feature Mappings For Image Reconstruction

Compare three feature representations: Raw pixels, Taylor series expansion, and Fourier feature mapping
Reconstruct grayscale (smiley.png) and RGB (cat.jpg) images using coordinate-based MLPs
Analyze convergence speed and reconstruction quality across different feature mappings
Evaluate performance on blurred images with Gaussian blur levels σ = 0 to 10

12 - Autoencoders

Build deep autoencoder from scratch for MNIST image reconstruction using custom MLP components
Apply autoencoder to anomaly detection on Labeled Faces in the Wild (LFW) dataset
Train exclusively on "George W Bush" images as normal class, detect other faces as anomalies
Analyze effect of bottleneck dimensions on anomaly detection performance using ROC/PR curves

13 - Multi-Task CNN FMNIST

Implement multi-task CNN jointly performing classification and regression on Fashion-MNIST
Classification: predict 10 clothing categories; Regression: predict normalized pixel intensity ("ink")
Optimize joint loss: L = λ₁L_CE + λ₂L_MSE with varying weight combinations
Visualize intermediate feature maps to interpret learned representations at different layers

14 - Image Colorization CNN

Build encoder-decoder CNN for CIFAR-10 image colorization (32×32 RGB images)
Treat colorization as classification: map pixels to 24 representative color clusters
Use learned upsampling via ConvTranspose2d layers in decoder
Perform extensive hyperparameter tuning (learning rate, batch size, filters, kernels) using wandb sweeps

15 - Decision Tree From Scratch

Implement decision tree classifier from scratch with custom node classes (Leaf and Internal)
Build functions for best binary split selection and recursive tree training
Apply to Amazon product review sentiment classification (bag-of-words, 7729 features)
Visualize tree structure and analyze decision boundaries

16 - Decision Tree Classification

Train decision trees on text sentiment classification using Scikit-Learn
Perform grid search over max_depth and min_samples_leaf hyperparameters
Visualize trained trees using ASCII representation and analyze node structures
Optimize using balanced accuracy metric with proper train/validation splits

17 - Random Forest Classification

Build ensemble Random Forest classifier for Amazon review sentiment analysis
Analyze feature importance to identify most predictive vocabulary terms
Tune max_features, max_depth, and n_estimators through grid search
Compare performance: Simple DT vs Best DT vs Simple RF vs Best RF

18 - KDE Foreground Detection

Implement Kernel Density Estimation (KDE) class from scratch using only numpy
Support multiple kernel types (Gaussian, Triangular, Uniform) with configurable bandwidth
Perform foreground detection by fitting KDE on background image and classifying test pixels
Optimize bandwidth and kernel selection for best segmentation results

19 - Time Series RNN

Discover discrete-time recurrence from univariate sequences using RNNs
Build predictors mapping history vectors to next values: x̂ₖ = g(xₖ₋₁, xₖ₋₂, ..., xₖ₋ₚ)
Identify closed-form analytical recurrence from learned RNN weights
Evaluate single-step and autoregressive multi-step predictions using MAE/MSE

20 - Time Series Forecasting

Forecast cumulative GitHub star growth trajectories for multiple repositories
Compare classical (ARMA) vs deep learning (RNN, 1D CNN) forecasting methods
Implement proper temporal splits ensuring no data leakage from future
Evaluate single-timestep and increasing-window multi-step forecasts across horizons

21 - Variational Autoencoders

Implement VAE with encoder, reparameterization trick, and decoder for Fashion-MNIST
Optimize combined loss: reconstruction (BCE) + KL divergence to standard normal prior
Investigate β-VAE: vary weighting between reconstruction and KL terms
Analyze effect of frozen latent parameters (µ, σ) on generation quality and diversity

Repository Structure

Each notebook is self-contained with:

Problem description and objectives
Implementation from scratch (where applicable)
Experiments and hyperparameter tuning
Visualization and analysis
Results and conclusions

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
01-dataset-analysis-visualization.ipynb		01-dataset-analysis-visualization.ipynb
02-k-nearest-neighbors.ipynb		02-k-nearest-neighbors.ipynb
03-linear-regression.ipynb		03-linear-regression.ipynb
04-k-means-clustering.ipynb		04-k-means-clustering.ipynb
05-gaussian-mixture-models.ipynb		05-gaussian-mixture-models.ipynb
06-image-segmentation.ipynb		06-image-segmentation.ipynb
07-principle-component-analysis.ipynb		07-principle-component-analysis.ipynb
08-pca-classification.ipynb		08-pca-classification.ipynb
09-data-transformation-clustering.ipynb		09-data-transformation-clustering.ipynb
10-neural-network-from-scratch.ipynb		10-neural-network-from-scratch.ipynb
11-feature-mappings-for-image-reconstruction.ipynb		11-feature-mappings-for-image-reconstruction.ipynb
12-autoencoders.ipynb		12-autoencoders.ipynb
13-multi-task-CNN-FMNIST.ipynb		13-multi-task-CNN-FMNIST.ipynb
14-image-colorization-CNN.ipynb		14-image-colorization-CNN.ipynb
15-decision-tree-from-scratch.ipynb		15-decision-tree-from-scratch.ipynb
16-decision-tree-classification.ipynb		16-decision-tree-classification.ipynb
17-random-forest-classification.ipynb		17-random-forest-classification.ipynb
18-kde-foreground-detection.ipynb		18-kde-foreground-detection.ipynb
19-time-series-rnn.ipynb		19-time-series-rnn.ipynb
20-time-series-forecasting.ipynb		20-time-series-forecasting.ipynb
21-variational-autoencoders.ipynb		21-variational-autoencoders.ipynb
README.md		README.md
problem_descriptions.pdf		problem_descriptions.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Course Projects

Requirements

01 - Dataset Analysis Visualization

02 - K-Nearest Neighbors

03 - Linear Regression

04 - K-Means Clustering

05 - Gaussian Mixture Models

06 - Image Segmentation

07 - Principle Component Analysis

08 - PCA Classification

09 - Data Transformation Clustering

10 - Neural Network From Scratch

11 - Feature Mappings For Image Reconstruction

12 - Autoencoders

13 - Multi-Task CNN FMNIST

14 - Image Colorization CNN

15 - Decision Tree From Scratch

16 - Decision Tree Classification

17 - Random Forest Classification

18 - KDE Foreground Detection

19 - Time Series RNN

20 - Time Series Forecasting

21 - Variational Autoencoders

Repository Structure

About

Uh oh!

Releases

Packages

Languages

samarthamp/ml-course-projects

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Course Projects

Requirements

01 - Dataset Analysis Visualization

02 - K-Nearest Neighbors

03 - Linear Regression

04 - K-Means Clustering

05 - Gaussian Mixture Models

06 - Image Segmentation

07 - Principle Component Analysis

08 - PCA Classification

09 - Data Transformation Clustering

10 - Neural Network From Scratch

11 - Feature Mappings For Image Reconstruction

12 - Autoencoders

13 - Multi-Task CNN FMNIST

14 - Image Colorization CNN

15 - Decision Tree From Scratch

16 - Decision Tree Classification

17 - Random Forest Classification

18 - KDE Foreground Detection

19 - Time Series RNN

20 - Time Series Forecasting

21 - Variational Autoencoders

Repository Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages