A robust machine learning system for detecting anomalies in industrial sensor data using advanced feature engineering and Random Forest classification.
- Overview
- Features
- Project Structure
- Requirements
- Installation
- Usage
- Data Pipeline
- Model Architecture
- Performance
- Testing
- Contributing
- License
This project implements a predictive maintenance system that uses machine learning to detect anomalies in industrial sensor data. The system processes temperature, vibration, and pressure readings, engineers relevant features, and trains a Random Forest classifier to identify potential equipment failures before they occur.
Perfect for manufacturing environments, energy plants, or any industrial setting where equipment monitoring is critical.
- Data Simulation: Generate realistic sensor data with built-in anomaly patterns
- Preprocessing Pipeline: Automated data normalization and cleaning
- Feature Engineering: Extract meaningful patterns from raw sensor data
- Anomaly Detection: Machine learning model to predict equipment failures
- Performance Metrics: Built-in accuracy evaluation
- Docker Support: Easy deployment in any environment
- Modular Design: Well-organized codebase for easy extension
predictive-maintenance-mlops/
├── data/ # Data storage directory (created on first run)
├── src/ # Source code
│ ├── __init__.py # Package marker
│ ├── data_ingestion.py # Data generation module
│ ├── preprocessing.py # Data cleaning and normalization
│ ├── feature_engineering.py # Feature creation module
│ └── train_model.py # Model training and evaluation
├── tests/ # Unit tests
│ └── test_preprocessing.py # Tests for preprocessing module
├── .github/ # GitHub configuration
│ └── workflows/ # CI pipeline configurations
├── .gitignore # Git ignore rules
├── .gitattributes # Git attributes configuration
├── Dockerfile # Docker configuration
├── main.py # Main execution script
├── requirements.txt # Dependencies
└── README.md # Project documentation
The project requires the following Python packages:
- numpy
- pandas
- scikit-learn
- matplotlib
- joblib
- pytest
- flask
- kafka-python
All dependencies are listed in requirements.txt.
git clone https://github.com/RelationalDigital/predictive-maintenance-mlops.git
cd predictive-maintenance-mlops# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Build the Docker image
docker build -t predictive-maintenance .To generate data, process it, and train the model:
python main.pyThe script will:
- Generate synthetic sensor data
- Preprocess and normalize the data
- Add engineered features
- Train a Random Forest model
- Save the trained model as
model.pkl - Print the model's accuracy
# Run the container
docker run predictive-maintenance
# To mount a volume for data persistence
docker run -v $(pwd)/data:/app/data predictive-maintenanceThe data pipeline consists of four main stages:
-
Data Ingestion: Generates/collects timestamped sensor readings
- Temperature (°F)
- Vibration (mm/s)
- Pressure (kPa)
- Anomaly labels
-
Preprocessing: Cleans and normalizes the data
- Parses timestamps
- Normalizes temperature values
- Handles missing values
-
Feature Engineering: Creates additional features from raw data
- Rolling temperature averages
- Vibration standard deviation
- Additional domain-specific features
-
Model Training: Trains and evaluates the anomaly detection model
- Random Forest Classifier
- Performance evaluation
- Model persistence
The anomaly detection system uses a Random Forest Classifier with the following configuration:
- 100 decision trees
- Features: temperature, vibration, pressure, rolling averages, and standard deviations
- Binary classification: normal operation (0) vs. anomaly (1)
- Train/test split: 80% training, 20% testing
The model is evaluated using accuracy as the primary metric. Typical performance on the synthetic dataset exceeds 90% accuracy, making it suitable for real-world industrial applications.
Additional metrics like precision, recall, and F1-score can be easily incorporated by modifying the train_model.py script.
Run the test suite to verify the system's components:
pytest tests/The test suite validates:
- Data preprocessing functionality
- Feature engineering correctness
- Model training and evaluation
Contributions are welcome! To contribute:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Please ensure your code passes all tests and follows the established coding style.
This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ by Raffaele Zarrelli from Relational Digital
⭐ Star this repository if you find it useful!
Keywords: predictive maintenance, anomaly detection, machine learning, IoT, industrial sensors, equipment monitoring