End-to-end ML pipeline for predicting wine quality, with automated ingestion → validation → transformation → training → evaluation, plus a simple Flask UI for inference.
- Orchestration: main.py runs all stages sequentially.
- Serving: app.py exposes
/train(runsmain.py) and/predict(UI form → prediction). - Model:
ElasticNetregression with tunablealphaandl1_ratiofrom params.yaml. - Tracking: MLflow logging and optional registry via
src.datascience.components.model_evaluation.ModelEvaluation.log_into_mlflow.
.
├─ main.py
├─ app.py
├─ config/
│ └─ config.yaml
├─ params.yaml
├─ schema.yaml
├─ src/datascience/
│ ├─ constants/__init__.py
│ ├─ config/configuration.py
│ ├─ entity/config_entity.py
│ ├─ utils/common.py
│ ├─ components/
│ │ ├─ data_ingestion.py
│ │ ├─ data_validation.py
│ │ ├─ data_transformation.py
│ │ ├─ model_trainer.py
│ │ └─ model_evaluation.py
│ └─ pipeline/
│ ├─ data_ingestion_pipeline.py
│ ├─ data_validation_pipeline.py
│ ├─ data_transformation_pipeline.py
│ ├─ model_trainer_pipeline.py
│ ├─ model_evaluation_pipeline.py
│ └─ prediction_pipeline.py
├─ templates/
│ ├─ index.html
│ └─ results.html
└─ artifacts/ (generated)
- Artifacts and paths: config/config.yaml
- Schema (columns,
TARGET_COLUMN): schema.yaml - Hyperparameters: params.yaml
- Constants:
src.datascience.constants - Config loader:
src.datascience.config.configuration.ConfigurationManager
- Ingestion:
src.datascience.pipeline.data_ingestion_pipeline.DataIngestionTrainingPipeline - Validation:
src.datascience.pipeline.data_validation_pipeline.DataValidationTrainingPipeline - Transformation:
src.datascience.pipeline.data_transformation_pipeline.DataTransformationTrainingPipeline - Training:
src.datascience.pipeline.model_trainer_pipeline.ModelTrainerTrainingPipeline - Evaluation/MLflow:
src.datascience.pipeline.model_evaluation_pipeline.ModelEvaluationTrainingPipeline - Prediction (serving):
src.datascience.pipeline.prediction_pipeline.PredictionPipeline
python -m venv env
source env/bin/activate
pip install -r requirements.txt
cp .env.example .env # optional, for MLflow credentials if neededpython main.pyArtifacts are written under artifacts/:
- Raw CSV: artifacts/data_ingestion/winequality-red.csv
- Splits: artifacts/data_transformation/train.csv, artifacts/data_transformation/test.csv
- Model: artifacts/model_trainer/model.joblib
- Metrics: artifacts/model_evaluation/metrics.json
- Logs: logs/logging.log
python app.py # serves on http://0.0.0.0:8080/renders templates/index.html (feature form)./predictreturns prediction via templates/results.html./traintriggers full pipeline viamain.py.
- Update schemas in schema.yaml when adding/removing columns.
- Adjust paths/artifacts in config/config.yaml when changing storage layout.
- Tune hyperparameters in params.yaml.
- Add components under
src/datascience/componentsand wire them insrc.datascience.config.configuration.ConfigurationManagerand pipelines undersrc/datascience/pipeline. - Keep utility helpers in
src/datascience/utils/common.pyfor I/O and serialization. - Ensure MLflow URI is set in
src.datascience.config.configuration.ConfigurationManager.get_model_evaluation_configif remote tracking is used.
This project is licensed under GPL-3.0; see LICENSE.