Skip to content

aryankumar120/MLFlowX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ MLFlowX: End-to-End MLOps Iris Classifier

A complete MLOps pipeline that automates the process of training, validating, and deploying a machine learning model securely β€” all in one place.
This project demonstrates how to use Flask, Streamlit, MLflow, and joblib to build a real-world ML pipeline for the classic Iris Flower Classification dataset 🌸.

🌟 Overview

MLFlowX shows how data scientists can go from raw data ➑️ deployed model ➑️ monitored pipeline using practical tools and clear workflows.

The project includes:

  • 🧠 Model training and evaluation pipeline
  • 🧩 Token-based authentication for secure API access
  • πŸ“‘ RESTful API for real-time predictions
  • πŸ’» Streamlit web app interface for interactive use
  • πŸ“Š Automated model report generation and versioning

Project Description

Building a machine learning pipeline that is effective and reliable can be a challenging task. One of the critical challenges is data leakage, where information from the future or target variable is inadvertently leaked into the training data, leading to over-optimistic model performance metrics. Another challenge is deploying the final model for real-world use, which requires integrating the model into an existing system and ensuring consistent performance over time.

This project provides an end-to-end process to address these challenges, from building to deploying machine learning models. The pipeline is designed to prevent data leakage and generate an automatic report for each run. Then, the model deploys as an API that can be used in a web app. This approach is to make model deployment more efficient and more manageable. It can be applied in a microservice architecture. In short, here are some interesting features that can be found in this project:

  • A machine learning pipeline model that prevents data leakage and generates an automatic report (development stage).
  • Microservices architecture approach consists of Authentication, Machine Learning, and Web App service.
  • Consider a security with authentication and hiding credentials.

Objective

The objective of this project is:

  • To build a machine learning pipeline that addresses data leakage issues.
  • To deploy the model in the production stage for real-world use.
  • To document the pipeline and deployment process for future reference.

Contents

The repository contains the following files and directories:

  • img/: A directory containing supported images for README.md.
  • result/: A directory containing the result from the pipeline.
  • README.md: A markdown file describing the project
  • pipeline.py: Python script that contains process in the development stage.
  • auth.py: Python script to build authentication service.
  • api.py: Python script to build a simple API as an endpoint for using machine learning models.
  • web_app.py: Python scripts to build web apps for user interaction.
  • requirements.txt: Contains a list of important libraries for the project.
  • pipeline.bin: Binary file that stores objects for the pipeline.

Data

This project uses the iris dataset, which is popular in machine learning. It can be found in UCI Dataset. This data was chosen because it is quite simple since this project emphasizes the pipeline and deployment process rather than model development. Here is the Attribute Information of the data:

  1. sepal_length: sepal length measurement of Iris flower in cm.
  2. sepal_width: sepal width measurement of Iris flower in cm.
  3. petal_length: petal length measurement of Iris flower in cm.
  4. petal_width: petal width measurement of Iris flower in cm.
  5. class: class of Iris flower (Iris Setosa, Iris Versicolour, Iris Virginica).

Approach

In this section, I will explain the two main parts of this project, i.e., Machine Learning Pipeline Development and related services in Production.

Development Stage

The development of the machine learning pipeline can be found in the pipeline.py file as follows:

  def run_pipeline(self):
    """
    Runs a machine learning pipeline from getting data until generating 
    the validation report.
    """
    print(f'{" STARTING PIPELINE ":=^42s}')
    self.get_data()
    self.prepare_data()
    self.build_model()
    self.test_model()
    self.export_file()
    self.generate_report()
    print(f'{" PIPELINE FINISH ":=^42s}')

This is the explanation of the main steps above:

  • self.get_data() is the early step where the pipeline starts by obtaining the dataset from a remote server (the UCI Machine Learning Dataset Web). The data is loaded into a Pandas DataFrame.

  • self.prepare_data(), part to prepare the data for modeling by following these steps:

    • Encode the class labels
    • Split the data into training and validation sets
    • Apply feature engineering techniques
    • Scale the training data
  • self.build_model(), machine learning models are built in this step. For comparison purposes, 3 models are utilized: Logistic Regression, Decision Tree Classifier, and Support Vector Classifier. Each model is trained using the preprocessed training data.

  • self.test_model() will evaluate the trained models using unseen / validation data, where it undergoes the same preprocessing processes as the training data first. Then, calculating various performance metrics for each model, including confusion matrices to visualize the model's predictive performance.

  • self.export_file(), to facilitate model deployment, export the pipeline objects, including the scaler, label encoder, and trained models, into a binary file (.bin). This file can be easily loaded for inference in a production environment.

  • self.generate_report is the final step that will generate a comprehensive report summarizing the pipeline's execution time, validation metrics comparison, and confusion matrices. The report is saved as an Excel file and a PNG image.

The Splitting data section at the beginning of preprocessing is the key to preventing data leakage in this project. Only a data train is used for the rest of the preprocessing process. Then the information obtained is stored for use in unseen data. The following is a schema that visualizes all the processes.

Development Stage Scheme
Image 1: Development Stage Scheme

Production Stage

Because it uses a microservice approach, in this production stage, there are 3 services: Authentication, Machine Learning, and Web App Services. The schematic can be seen in the image below.

Microservice Architecture at Production Stage
Image 2: Microservice Architecture at Production Stage

Auth Service

User authentication is a crucial aspect of web applications that ensures only authenticated users can access certain resources. In this project, we applied basic user authentication using Flask, a popular web framework, and JWT (JSON Web Tokens), a secure method for transmitting authentication information. For this service, two endpoints use to access the machine learning API and Web App:

  • Login Endpoint: The /auth/login route is defined to handle HTTP POST requests for user authentication. Upon receiving a request, the code retrieves the provided username and password from the request's JSON payload. It then verifies the credentials against the registered users. If the credentials are valid, an access token is generated using the generate_access_token() helper function and returned as a JSON response. Otherwise, an error message indicating invalid credentials is returned.
  • Access Token Validation Endpoint: The /auth/validate/<access_token> route is defined to handle HTTP GET requests for access token validation. The <access_token> parameter in the route represents the token to be validated. A JSON response with a token_valid value of True is returned if the token is valid. Otherwise, a token_valid value of False is returned to indicate an invalid token.

Machine Learning (ML)-API

The ML API service is built using the Flask framework. It leverages the power of Flask to handle incoming requests and deliver predictions based on a pre-trained model from the development stage. The module defines two API routes, each handling a specific type of request:

  • /v1/predict/realtime/<sepal_length>/<sepal_width>/<petal_length>/<petal_width>`
    A route that is used for making predictions on custom input data.
  • /v1/predict/by_index/<index>
    A route that retrieves existing data from a remote server and makes predictions based on that data.

Basically, both routes do this process:

  • Token Validation: The module first validates the access token provided in the request headers. An appropriate error message is returned if the token is invalid or expired.
  • Getting input data: the first route can be custom value based on the user, and the second route data from a remote server.
  • Loading the Pipeline Object: The information about preprocessing and pre-trained model stored in pipeline.bin is loaded using joblib. The module returns the corresponding error message if the model file is not found or an error occurs during loading.
  • Data Preprocessing: The retrieved input data underwent the same preprocessing process as during the development stage, with the same treatment based on information from pipeline.bin.
  • Prediction: The preprocessed data is passed to the predict function, which iterates over the models in the pipeline and generates predictions for each model. The results include the predicted class and its probability for each model.
  • Response: The module returns a JSON response containing the raw input data, the preprocessed data, and the prediction results.

Web Service

Web service in this project focuses on classifying Iris flowers using machine learning, and it leverages two supporting services: an ML-Prediction Service and an Auth Service. For the user interface on the web, two pages are built:

  • Login Page The web service features a login page that ensures secure access to the main application. Users are required to enter their credentials (username and password) to authenticate themselves. The authenticate function communicates with the Auth Service to validate the user's credentials. An access token is generated if the authentication is successful, allowing the user to proceed to the main page.
    Web App Display (Login Page)
    Image 3: Web App Login Page Display
  • Main Page
    The main page of the web service is where the Iris flower classification takes place. It begins with the Iris flower story, followed by tabs that provide users with two options: a custom prediction based on user-inputted characteristics or an existing prediction using an index from the dataset. Streamlit's intuitive interface allows users to switch between these options seamlessly using tabs.
    Web App Display (Main Page)
    Image 4: Web App Main Page Display

Result

  1. Report

    Graph Report
    Image 6: Report of Machine Learning Models from Unseen Data

    From the following report, even though the validation data has a slightly different distribution, the test results with the three models used show results that are not much different. This makes sense because the iris dataset used is good.

  2. Access Token To try to get an access token, you can run the following command:

    $headers = @{"Content-Type" = "application/json"}
    $body = @{
       "username" = "<your-username>"
       "password" = "<your-password>"
    } | ConvertTo-Json
    $response = Invoke-RestMethod -Uri "http://localhost:5000/auth/login" -Method POST -Headers $headers -Body $body | Write-Host
  3. Machine Learning Prediction To test the API, you can open the endpoint from the terminal with a token in the headers:

    • curl -X POST -H "Auth: <token>" http://localhost:5002/v1/predict/realtime/5.84/3.05/3.75/6
    predict realtime
    Image 7: ML API Result in Terminal (v1/predict/realtime/'s route)
    • curl -X POST -H "Auth: <token>" http://localhost:5002/v1/predict/by_index/5
    predict by index
    Image 8: ML API result in terminal (v1/predict/by_index/'s route)
  4. Web App Use
    The Web App runs on http://localhost:8501. After successful login, it will go to the main page. Here is the display and prediction result example:

    • Prediction result of user custom input on web
    Web App Predicting User Input
    Image 9: Web App Tab Custom and Prediction Result
    • Prediction result of existing data
    Web App Predicting Existing Data
    Image 10: Web App Tab Exsting (by index) and prediction result

About

End-to-End MLOps Iris Classifier with Auth, API, and Streamlit Deployment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •