This project focuses on building a Neural Network Model using PyTorch to predict house prices based on key property features. We started from raw data, performed feature engineering, data preprocessing, model training, and evaluation using RMSE (Root Mean Squared Error).
- The dataset contains various features related to house attributes such as square footage, number of rooms, garage size, year built, and neighborhood.
- The target variable (
SalePrice) represents the actual selling price of the houses.
-
Selected the most relevant features for price prediction.
-
Added a new feature:
TotalSF(Total Square Footage), calculated as: -
One-hot encoded categorical variables (
Neighborhood,KitchenQual). -
Standardized all numerical features using
StandardScaler()to improve model performance. -
Split dataset into training (80%) and validation (20%).
- Built a Multi-Layer Perceptron (MLP) with:
Batch Normalizationfor stable training.ReLU Activationfor non-linearity.Dropout Layersto prevent overfitting.Adam Optimizerfor efficient weight updates.- Defined Loss Function (
MSELoss) for regression.
- Implemented Early Stopping to prevent overfitting.
- Used ReduceLROnPlateau to dynamically adjust learning rate.
- Trained the model for up to 100 epochs, stopping early when validation loss stopped improving.
- Converted predictions back to the original price scale using
StandardScaler.inverse_transform(). - Computed RMSE (Root Mean Squared Error) to measure model accuracy.
- Compared RMSE before and after adding
TotalSF. - Adjusted learning rate, dropout, and batch size to optimize model performance.
- Ensure you have Python, PyTorch, pandas, and scikit-learn installed.
- Load the dataset (
train.csv,test.csv). - Run
preprocessing.pyto prepare the data. - Train the model using
train.py. - Evaluate predictions and check the RMSE.
β
Feature Engineering matters β Adding TotalSF improved predictions.
β
Neural Networks work for regression, but tuning is key.
β
Early Stopping & LR Scheduling help prevent overfitting.