Mushroom Classification: Edible vs Poisonous

Overview

This project uses machine learning to classify mushrooms as edible or poisonous based on 22 categorical features describing physical characteristics. The dataset originates from the Audubon Society Field Guide to North American Mushrooms and was donated to the UCI Machine Learning Repository in 1987. Given the real-life consequences of misidentification, this problem is both challenging and impactful.

Problem Statement

Objective:
Build a classification model to predict whether a mushroom is edible (e) or poisonous (p) based on observable traits like odor, gill size, cap shape, and spore print color.

Dataset

Samples: 8,124
Features: 22 categorical features + 1 target label (class)
Target Values:
- e = Edible
- p = Poisonous
  Note: The original "unknown edibility" category was combined with poisonous to ensure safety.
Source: [https://www.kaggle.com/datasets/uciml/mushroom-classification)

Steps of Work Done

1. Data Loading & Exploration

Loaded and inspected the dataset using pandas
Counted total rows and columns
Identified that all features are categorical
Noted missing values: stalk-root had ~30.5% missing, marked as "?"

2. Target & Class Distribution

Verified target class (class) is nearly balanced
- Edible: 51.8%
- Poisonous: 48.2%
Plotted distribution for transparency

3. Feature Analysis

Used pd.crosstab to find features highly correlated with the poisonous class
Top predictors included:
- odor
- gill-size
- spore-print-color
Created bar plots and histograms to visualize feature separability

4. Data Cleaning

Dropped veil-type (constant feature)
Replaced missing stalk-root values with its mode
One-hot encoded all categorical features using sklearn.OneHotEncoder with drop='first'
Final shape: 8124 samples × 92 encoded features

5. Data Splitting

Split into:
- 80% training
- 10% validation
- 10% testing
Used stratified sampling to maintain class balance

6. Model Training

Chose Random Forest Classifier:
- 100 trees
- Class weight = "balanced"
- Random seed for reproducibility

7. Evaluation Metrics

Evaluated on both validation and test sets
Reported:
- Accuracy
- Precision
- Recall
- F1-score
- ROC AUC
Confusion matrices visualized using seaborn

8. Feature Importance

Extracted top 20 features by importance from the trained model
Displayed using a horizontal bar chart

Project Structure

├── FinalProj.ipynb           
├── mushrooms.csv             
├── README.md

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Code		Code
Dataset		Dataset
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mushroom Classification: Edible vs Poisonous

Overview

Problem Statement

Dataset

Steps of Work Done

1. Data Loading & Exploration

2. Target & Class Distribution

3. Feature Analysis

4. Data Cleaning

5. Data Splitting

6. Model Training

7. Evaluation Metrics

8. Feature Importance

Project Structure

About

Uh oh!

Releases

Packages

Languages

isihack/Mushroom-classification-Proj

Folders and files

Latest commit

History

Repository files navigation

Mushroom Classification: Edible vs Poisonous

Overview

Problem Statement

Dataset

Steps of Work Done

1. Data Loading & Exploration

2. Target & Class Distribution

3. Feature Analysis

4. Data Cleaning

5. Data Splitting

6. Model Training

7. Evaluation Metrics

8. Feature Importance

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages