This project implements customer segmentation using Principal Component Analysis (PCA) for dimensionality reduction and Gaussian Mixture Models (GMM) for clustering. It analyzes customer data and visualizes the clusters in 2D and 3D. The optimal number of clusters is determined using the Bayesian Information Criterion (BIC).
- Preprocessing: Handles numerical and categorical data using StandardScaler and OneHotEncoder.
- Dimensionality Reduction: Implements a custom PCA algorithm.
- Clustering: Uses Gaussian Mixture Models (GMM) for clustering.
- Model Evaluation: Computes Silhouette Scores to assess clustering performance.
- Visualization: Displays clusters in 2D and 3D using Matplotlib.
📁 3D_GMM_Clustering
│── main.py # Main script to execute the pipeline
│── train.py # PCA and GMM training module
│── evaluate.py # Clustering evaluation and visualization
│── data_processing.py # Preprocessing functions for data
│── dataset.csv # Input dataset
│── README.md # Project documentation
│── requirements.txt # Necessary packages for project
- Clone the repository:
git clone https://github.com/SagarMaddela/3D-clustering-using-Gaussian-Mixture-Models.git cd 3D-clustering-using-Gaussian-Mixture-Models - Install dependencies:
pip install -r requirements.txt
Run the main script to execute the pipeline:
python main.pyThe script will:
- Load and preprocess the dataset (
dataset.csv). - Apply PCA for dimensionality reduction.
- Determine the optimal number of clusters using BIC.
- Train a GMM model and predict clusters.
- Compute Silhouette Scores.
- Visualize clusters in 2D and 3D.
The project provides two types of cluster visualizations:
- 2D Plot (First two PCA components)
- 3D Scatter Plot (If
n_components=3in PCA)
- If dataset.csv is missing, ensure you place a valid dataset in the project folder.
- Ensure
matplotlib,numpy, andsklearnare installed before running the project. - Adjust the
n_componentsandmax_clustersparameters inmain.pyfor experimentation.
This project is open-source and available under the MIT License.