Automated Feature Engineering & Model Selection for Tabular Data

Description

This project implements a basic AutoML pipeline in Python designed to automate parts of the machine learning workflow for tabular datasets. It focuses on:

Automated feature selection/dimensionality reduction using techniques like RFE and PCA.
Automated training and evaluation of different machine learning models (Decision Trees, SVM, Neural Networks).
Comparing the performance of different preprocessing/model combinations across various datasets.

The goal is to streamline the process of finding effective preprocessing strategies and models for structured data.

Features

Loads CSV datasets.
Performs basic preprocessing (imputation, scaling, encoding).
Applies optional feature selection (RFE) or dimensionality reduction (PCA).
Trains and evaluates Decision Tree, SVM, and a simple Neural Network.
Compares results across different preprocessing steps and models.
Outputs results to a CSV file.

Technologies Used

Python 3.x
Pandas
NumPy
Scikit-learn
TensorFlow (Keras)

Setup

Clone the repository:

git clone <your-repository-url>
cd <your-repository-name>

Create a Python environment (recommended):

Using Conda:

conda create -n automl_env python=3.9 # Or another version
conda activate automl_env
conda install pandas numpy scikit-learn tensorflow matplotlib seaborn -c conda-forge

Using pip and venv:

python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
pip install pandas numpy scikit-learn tensorflow matplotlib seaborn

Usage

Configure Dataset Paths:
- Open the main script (automl_pipeline.py or your chosen filename).
- Locate the CONFIG dictionary near the top.
- IMPORTANT: Update the "path" values within CONFIG["datasets"] to point to the actual locations of your dataset CSV files (e.g., Titanic, Credit Card Fraud).
Run the Pipeline:
```
python automl_pipeline.py
```
Check Results:
- The script will print progress to the console.
- A summary of the results will be saved to automl_pipeline_results.csv (or the filename specified in CONFIG["results_file"]).

Team Members

Khushi Choudhary
Snehitha Gorantla
Zakir Elaskar

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.devcontainer		.devcontainer
Data		Data
Math485-Project		Math485-Project
venv		venv
.DS_Store		.DS_Store
.gitignore		.gitignore
Readme.md		Readme.md
app.py		app.py
automl_report.md		automl_report.md
automlpipeline.ipynb		automlpipeline.ipynb
dataset.py		dataset.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automated Feature Engineering & Model Selection for Tabular Data

Description

Features

Technologies Used

Setup

Usage

Team Members

About

Uh oh!

Releases

Packages

Languages

zakelaskar/Math485-Project

Folders and files

Latest commit

History

Repository files navigation

Automated Feature Engineering & Model Selection for Tabular Data

Description

Features

Technologies Used

Setup

Usage

Team Members

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages