This project predicts bike rental demand using machine learning techniques. It downloads data from a specified URL, performs feature engineering, trains several regression models (including a stacked model), and generates a submission file.
- Introduction
- Features
- Getting Started
- Project Structure
- Model Training
- File Descriptions
- Contributing
- License
This project aims to build a predictive model for bike rental demand based on various environmental and temporal features. It demonstrates a full data science workflow from downloading data to making predictions using various models.
- Data Download: Downloads compressed data from a specified URL.
- Data Preprocessing: Handles missing values, performs feature engineering, and encodes categorical features.
- Model Training: Implements several regression models:
- Ordinary Least Squares (OLS) Regression
- Decision Tree Regression
- Random Forest Regression
- AdaBoost Regression
- XGBoost Regression
- Stacked Model
- Model Evaluation: All models are validated using the RMSLE error metric.
- Feature Importance: Provides a visualization of XGBoost feature importances.
- Submissions: Generates a submission CSV file with predictions.
- Command-Line Interface: Uses
argparseto allow configuration of paths from the command line. - Logging: Provides informative logging of project execution.
- Python 3.6 or higher
pip(Python package installer)- Internet connection (for data download)
-
Clone the repository:
git clone https://github.com/your-username/your-repo-name.git cd your-repo-name -
Install the required packages:
pip install -r requirements.txt
-
Download the data and generate a submission file
python bike_rental_project.py --input_path /path/to/your/input --working_path /path/to/your/working --submission_file submission.csv
- Replace
/path/to/your/inputwith the desired input directory,/path/to/your/workingwith the working directory, andsubmission.csvwith the desired name of the submission file.
or for the default folders:
python bike_rental_project.py
- Replace
- The model will use the
/kaggle/inputand/kaggle/workingfolders for input, and create the outputsubmission.csvfile, you can use the parameter--force_downloadif you want to download again the data.
├── bike_rental_project.py # Main Python script ├── requirements.txt # Libraries to install └── README.md # This file
The code trains several models, and it shows the feature importances of the XGBoost Model, and saves the submission of the stacked model.
bike_rental_project.py: Contains the main Python script with all the project logic.requirements.txt: A list of the python packages required to execute the program.data: The input folder, where the training and test set reside.submission.csv: the submission that was created
Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and commit them.
- Push your changes to your fork.
- Submit a pull request.