Welcome to this MLOps project, designed to demonstrate a robust pipeline for managing vehicle insurance data. This project aims to impress recruiters and visitors by showcasing the various tools, techniques, services, and features that go into building and deploying a machine learning pipeline for real-world data management. Follow along to learn about project setup, data processing, model deployment, and CI/CD automation!
- Start by executing the
template.pyfile to create the initial project template, which includes the required folder structure and placeholder files.
- Write the setup for importing local packages in
setup.pyandpyproject.tomlfiles. - Tip: Learn more about these files from
crashcourse.txt.
- Create a virtual environment and install required dependencies from
requirements.txt:conda create -n vehicle python=3.10 -y conda activate vehicle pip install -r requirements.txt
- Verify the local packages by running:
pip list
- Sign up for MongoDB Atlas and create a new project.
- Set up a free M0 cluster, configure the username and password, and allow access from any IP address (
0.0.0.0/0). - Retrieve the MongoDB connection string for Python and save it (replace
<password>with your password).
- Create a folder named
notebook, add the dataset, and create a notebook filemongoDB_demo.ipynb. - Use the notebook to push data to the MongoDB database.
- Verify the data in MongoDB Atlas under Database > Browse Collections.
- Create logging and exception handling modules. Test them on a demo file
demo.py.
- Analyze and engineer features in the
EDAandFeature Enggnotebook for further processing in the pipeline.
- Define MongoDB connection functions in
configuration.mongo_db_connections.py. - Develop data ingestion components in the
data_accessandcomponents.data_ingestion.pyfiles to fetch and transform data. - Update
entity/config_entity.pyandentity/artifact_entity.pywith relevant ingestion configurations. - Run
demo.pyafter setting up MongoDB connection as an environment variable.
- Set MongoDB URL:
# For Bash export MONGODB_URL="mongodb+srv://<username>:<password>...." # For Powershell $env:MONGODB_URL = "mongodb+srv://<username>:<password>...."
- Note: On Windows, you can also set environment variables through the system settings.
- Define schema in
config.schema.yamland implement data validation functions inutils.main_utils.py.
- Implement data transformation logic in
components.data_transformation.pyand createestimator.pyin theentityfolder.
- Define and implement model training steps in
components.model_trainer.pyusing code fromestimator.py.
-
Log in to the AWS console, create an IAM user, and grant
AdministratorAccess. -
Set AWS credentials as environment variables.
# For Bash export AWS_ACCESS_KEY_ID="YOUR_AWS_ACCESS_KEY_ID" export AWS_SECRET_ACCESS_KEY="YOUR_AWS_SECRET_ACCESS_KEY"
-
Configure S3 Bucket and add access keys in
constants.__init__.py.
- Create an S3 bucket named
my-model-mlopsprojin theus-east-1region. - Develop code to push/pull models to/from the S3 bucket in
src.aws_storageandentity/s3_estimator.py.
- Implement model evaluation and deployment components.
- Create
Prediction Pipelineand set upapp.pyfor API integration.
- Add
staticandtemplatedirectories for web UI.
- Create
Dockerfileand.dockerignore. - Set up GitHub Actions with AWS authentication by creating secrets in GitHub for:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_DEFAULT_REGIONECR_REPO
- Set up an EC2 instance for deployment.
- Install Docker on the EC2 machine.
- Connect EC2 as a self-hosted runner on GitHub.
- Open the 5080 port on the EC2 instance.
- Access the deployed app by visiting
http://<public_ip>:5080.
- Data Ingestion β Data Validation β Data Transformation
- Model Training β Model Evaluation β Model Deployment
- CI/CD Automation with GitHub Actions, Docker, AWS EC2, and ECR
If you found this project helpful or have any questions, feel free to reach out! mp5272672@gmail.com
This README provides a structured walkthrough of the MLOps project, showcasing the end-to-end pipeline, cloud integration, CI/CD setup, and robust data handling capabilities.