Comprehensive Data Engineering Platform for Powder Bed Fusion - Laser Beam/Metal (PBF-LB/M) Additive Manufacturing Research
This project provides a complete data pipeline solution for PBF-LB/M research, enabling advanced data analysis, process optimization, and quality assurance through:
- Multi-Model NoSQL Architecture: PostgreSQL, MongoDB, Redis, Cassandra, Elasticsearch, Neo4j
- Advanced Build File Processing: libSLM/PySLM integration for world-class parsing
- 3D Voxel Visualization: Spatially-resolved process control and quality analysis
- Comprehensive Analytics: Sensitivity analysis, statistical modeling, and ML
- Virtual Environment: Virtual testing and simulation capabilities
- Real-Time Processing: Streaming data ingestion and processing
graph TB
subgraph "📊 Data Sources"
ISPM[ISPM Sensors<br/>📡 Real-time]
CT[CT Scans<br/>🔬 Batch]
BUILD[Build Files<br/>🏗️ Batch]
CAD[CAD Models<br/>📐 Batch]
end
subgraph "💾 Data Lakes"
DATA_LAKE_LOCAL[Local Data Lake<br/>📦 Historical Data]
DATA_LAKE_CLOUD[Cloud Data Lake<br/>☁️ Historical Data]
end
subgraph "⚙️ Processing"
KAFKA[Kafka Streaming]
SPARK[Spark Processing]
AIRFLOW[Airflow Orchestration]
end
subgraph "🏠 Local Storage"
POSTGRES[(PostgreSQL<br/>🗄️ Operational)]
MONGODB[(MongoDB<br/>🍃 Documents)]
REDIS[(Redis<br/>⚡ Cache)]
MINIO[(MinIO<br/>📦 Object Storage)]
CLICKHOUSE[(ClickHouse<br/>📊 Data Warehouse)]
ELASTICSEARCH[(Elasticsearch<br/>🔍 Search & Analytics)]
end
subgraph "☁️ Cloud Storage"
SNOWFLAKE[(Snowflake<br/>❄️ Analytics)]
AWS_S3[(AWS S3<br/>☁️ Data Lake)]
BIGQUERY[(BigQuery<br/>🔍 Research)]
end
subgraph "🤖 ML & Research"
ML_TRAINING[ML Model Training]
ADVANCED_ANALYTICS[Advanced Analytics]
RESEARCH[Research Queries]
OPERATIONS[Daily Operations<br/>📊 Operational Work]
end
%% Data Flow
ISPM --> KAFKA --> SPARK
CT --> SPARK
BUILD --> SPARK
CAD --> SPARK
DATA_LAKE_LOCAL --> SPARK
DATA_LAKE_CLOUD --> SPARK
SPARK --> POSTGRES
SPARK --> MONGODB
SPARK --> REDIS
SPARK --> ELASTICSEARCH
SPARK --> CLICKHOUSE
SPARK --> SNOWFLAKE
SPARK --> MINIO
%% ML and Analytics Usage
POSTGRES -->|Real-time Queries| OPERATIONS
MONGODB -->|Document Queries| OPERATIONS
REDIS -->|Cache Access| OPERATIONS
CLICKHOUSE -->|ML Models, Quality Prediction, Parameter Optimization| ML_TRAINING
SNOWFLAKE -->|ML Models, Quality Prediction, Parameter Optimization| ML_TRAINING
CLICKHOUSE -->|Sensitivity Analysis, Statistical Analysis, Process Analysis| ADVANCED_ANALYTICS
SNOWFLAKE -->|Sensitivity Analysis, Statistical Analysis, Process Analysis| ADVANCED_ANALYTICS
ELASTICSEARCH -->|Full-text Search, Log Analysis, Real-time Search| ADVANCED_ANALYTICS
AWS_S3 -->|Data Exploration| RESEARCH
BIGQUERY -->|Ad-hoc Queries| RESEARCH
%% Styling
classDef dataSource fill:#e1f5ff,stroke:#01579b,stroke-width:2px,color:#000
classDef dataLake fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#000
classDef processing fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000
classDef localStorage fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000
classDef cloudStorage fill:#e0f2f1,stroke:#004d40,stroke-width:2px,color:#000
classDef mlResearch fill:#fce4ec,stroke:#880e4f,stroke-width:2px,color:#000
class ISPM,CT,BUILD,CAD dataSource
class DATA_LAKE_LOCAL,DATA_LAKE_CLOUD dataLake
class KAFKA,SPARK,AIRFLOW processing
class POSTGRES,MONGODB,REDIS,MINIO,CLICKHOUSE,ELASTICSEARCH localStorage
class SNOWFLAKE,AWS_S3,BIGQUERY cloudStorage
class ML_TRAINING,ADVANCED_ANALYTICS,RESEARCH,OPERATIONS mlResearch
- Multi-Source Ingestion: Streaming (Kafka), Batch (ETL), CDC (Change Data Capture)
- Real-Time Processing: Apache Flink for streaming, Apache Spark for batch
- Multi-Model Storage: Optimized database selection based on data characteristics
- Quality Management: Comprehensive validation, monitoring, and remediation
- Workflow Orchestration: Apache Airflow DAGs for complex workflows
- libSLM Integration: C++ library for parsing .mtt, .sli, .cli, .rea, .slm files
- PySLM Integration: Python library for advanced analysis and visualization
- 10 Specialized Extractors: Power, velocity, path, energy, layer, timestamp, focus, jump, style, geometry
- Per-Geometry Parameters: Laser parameters defined for individual scan paths
- CT-Build Correlation: Temporal correlation for defect analysis
- 🚀 Build File Editor: Revolutionary tool for modifying build files and generating artificial artifacts
- 3D Voxel Grid: Spatially-resolved representation of PBF-LB/M components
- Multi-Modal Fusion: Integration of CAD, process, ISPM, and CT data
- Interactive 3D Rendering: Real-time visualization and navigation
- Defect Detection: AI-powered 3D defect detection and classification
- Porosity Analysis: Comprehensive porosity characterization
- Sensitivity Analysis: Sobol indices, Morris screening, design of experiments
- Statistical Analysis: Multivariate, time series, regression, nonparametric methods
- Process Analysis: Parameter optimization, quality prediction, sensor analysis
- ML Integration: Random forest, neural networks, Bayesian analysis
- VM Management: Virtual machine orchestration and provisioning
- Simulation Engines: Thermal, fluid, mechanical, multi-physics simulation
- Digital Twin: Real-time synchronization and prediction
- Testing Frameworks: Experimental design, automated testing, validation
- Cloud Integration: AWS, Azure, GCP with distributed computing
- libSLM/PySLM Integration: World-class parsing of .mtt, .sli, .cli, .rea, .slm files
- 10 Specialized Extractors: Power, velocity, path, energy, layer, timestamp, focus, jump, style, geometry
- Per-Geometry Parameters: Laser parameters defined for individual scan paths
- CT-Build Correlation: Temporal correlation for defect analysis
- 🚀 Build File Editor: Revolutionary tool for modifying build files and generating artificial artifacts
- Spatial Resolution: Voxel-level analysis and process control
- Multi-Modal Fusion: Integration of CAD, process, ISPM, and CT data
- Interactive 3D Rendering: Real-time visualization and navigation
- Defect Detection: AI-powered 3D defect detection and classification
- Sensitivity Analysis: Sobol indices, Morris screening, design of experiments
- Statistical Analysis: Multivariate, time series, regression, nonparametric methods
- Process Analysis: Parameter optimization, quality prediction, sensor analysis
- ML Integration: Random forest, neural networks, Bayesian analysis
- VM Management: Virtual machine orchestration and provisioning
- Simulation Engines: Thermal, fluid, mechanical, multi-physics simulation
- Digital Twin: Real-time synchronization and prediction
- Testing Frameworks: Experimental design, automated testing, validation
- Apache Spark: Distributed data processing and ETL
- Apache Kafka: Real-time data streaming
- Apache Airflow: Workflow orchestration
- DBT: Data transformation and modeling
For detailed information on all data models, schemas, and relationships, see Data Models Reference
- PostgreSQL: Primary operational database for daily operational work and real-time queries
- MongoDB: Document storage for daily operational work, unstructured data, metadata
- Redis: High-performance caching layer for daily operations, session management
- MinIO: Local object storage (S3-compatible), raw data backup, development datasets
- ClickHouse: Columnar data warehouse for analytics, time-series data, and ML training
- Elasticsearch: Search and analytics engine for full-text search, log analysis, and real-time search capabilities
- Snowflake: Large-scale analytics, data warehousing, ML training, business intelligence
- AWS S3: Scalable data lake, long-term storage, data archiving
- BigQuery: Ad-hoc queries, data exploration, research analytics
- MongoDB Atlas: Managed document storage, global distribution
- Training Data: Stored in both local (fast access) and cloud (scalability)
- Research Data: Cloud storage for collaboration and sharing
- Analytics: Data warehouse for complex queries and business intelligence
- Data Lake: Raw data storage for exploration and experimentation
The data stored in ClickHouse and Snowflake is used for the following ML and analytics activities:
🤖 ML Model Training (using ClickHouse & Snowflake):
- ML Models: Random Forest, Neural Networks, Bayesian Analysis
- Quality Prediction: Defect detection and quality forecasting models
- Parameter Optimization: ML-driven process parameter tuning
📊 Advanced Analytics (using ClickHouse & Snowflake):
- Sensitivity Analysis: Sobol indices, Morris screening, design of experiments
- Statistical Analysis: Multivariate analysis, time series, regression, nonparametric methods
- Process Analysis: Sensor data analysis, process optimization, root cause analysis
- Docker: Containerization
- Kubernetes: Container orchestration
- Terraform: Infrastructure as code
- Prometheus: Monitoring and alerting
- Grafana: Visualization and dashboards
- Python 3.9+: Primary programming language
- PySpark: Spark Python API
- FastAPI: API development
- Pydantic: Data validation and serialization
- Pytest: Testing framework
pbf-lbm-nosql-data-warehouse/
├── src/ # Source code
│ ├── data_pipeline/ # Main data pipeline
│ ├── core/ # Core domain entities
│ └── ml/ # Machine learning models
├── config/ # Configuration files
├── docs/ # Documentation
├── docker/ # Docker configurations
├── requirements/ # Python dependencies
└── roadmaps/ # Project roadmap
Detailed project structure available in docs/project-structure.md
This platform implements a comprehensive data flow architecture where data from multiple sources (real-time sensors, batch files, and historical data lakes) flows through Apache Spark for transformation and is then distributed to optimized storage systems based on use case requirements.
Key Points:
- Real-time streaming data (ISPM sensors) flows through Kafka to Spark for processing
- Batch data (CT scans, build files, CAD models) and historical data lakes are processed directly by Spark
- Spark performs transformations and distributes data to multiple storage systems simultaneously
- Storage selection is optimized based on data characteristics and usage patterns (operational vs. analytics vs. ML)
- Daily Operations → PostgreSQL, MongoDB, Redis for operational work, real-time queries, and caching
- ML Training → ClickHouse & Snowflake for model training and analytics workloads
- Analytics → ClickHouse & Snowflake for advanced analytics and business intelligence
- Search & Analytics → Elasticsearch for full-text search, log analysis, and real-time search capabilities
- Batch Data → Cloud Storage (Snowflake, AWS S3) for analytics and research
- Data Warehouse → ClickHouse (Local) for columnar analytics and time-series data
- Data Lake Input → Historical data from Data Lakes (separate from storage) can be ingested through Spark for batch processing, ML training, and analytics
- Research Data → Cloud Storage for collaboration and sharing
- Python 3.8+
- Apache Spark 3.4+
- Apache Airflow 3.1+
- PostgreSQL 13+
- Docker and Docker Compose
- Clone the repository:
git clone <repository-url>
cd pbf-lbm-nosql-data-warehouse- Install core dependencies:
pip install -r requirements/requirements_core.txt
pip install -r requirements/requirements_airflow.txt
pip install -r requirements/requirements_ml.txt- Set up external libraries:
# Install libSLM (C++ library with Python bindings)
cd src/data_pipeline/external/libSLM
mkdir build && cd build
cmake ..
make -j4
make install
# Install PySLM (Python library)
cd src/data_pipeline/external/pyslm
pip install -e .- Start the system:
docker-compose -f docker/docker-compose.dev.yml up -d
python scripts/init_database.py
python scripts/start_pipeline.pysequenceDiagram
participant ISPM as ISPM Sensors
participant Kafka as Kafka Stream
participant Parser as Build Parser
participant Voxel as Voxel Processor
participant Analytics as Analytics Engine
participant Storage as Multi-Model Storage
ISPM->>Kafka: Real-time Data
Kafka->>Parser: Stream Processing
Parser->>Voxel: Process Parameters
Voxel->>Analytics: Voxel Data
Analytics->>Storage: Analysis Results
Storage->>Voxel: Historical Data
Voxel->>Analytics: Enhanced Analysis
flowchart TB
Start([Start: Load .slm Build File]) --> Parse[Build File Parser<br/>libSLM/PySLM]
Parse --> Extract[Extract Scan Points<br/>Coordinates, Parameters, Layers]
Extract --> Convert[Convert to JSON<br/>Structured Data Format]
Convert --> Edit{Editing Mode}
Edit -->|1. Precision Defect Introduction| Defect[Precision Defect Generator]
Edit -->|2. Process Parameter Manipulation| Param[Parameter Editor]
Edit -->|3. Controlled Quality Variation| Quality[Quality Variation Engine]
Defect --> DefectSpec[Specify Spatial Coordinates<br/>x, y, z, radius]
DefectSpec --> DefectType{Defect Type}
DefectType -->|Porosity| Porosity[Generate Porosity<br/>Power Reduction<br/>Velocity Increase<br/>Exposure Reduction]
DefectType -->|Crack| Crack[Generate Crack<br/>Orientation, Length<br/>Power Modulation]
DefectType -->|Dimensional Deviation| DimDev[Generate Deviation<br/>Geometry Modification<br/>Layer Thickness Change]
Param --> ParamSelect[Select Scan Points<br/>Individual or Region]
ParamSelect --> ParamMod[Modify Parameters]
ParamMod --> Power[Laser Power<br/>Granular Control]
ParamMod --> Speed[Scan Speed<br/>Point-by-Point]
ParamMod --> Exposure[Exposure Parameters<br/>Time, Energy Density]
Quality --> QualityType{Quality Variation Type}
QualityType -->|Systematic Porosity| SysPorosity[Controlled Porosity Distribution<br/>Size, Density, Location]
QualityType -->|Systematic Cracks| SysCrack[Controlled Crack Patterns<br/>Network, Orientation]
QualityType -->|Dimensional Deviations| SysDim[Controlled Dimensional Changes<br/>Tolerance Variations]
Porosity --> Validate
Crack --> Validate
DimDev --> Validate
Power --> Validate
Speed --> Validate
Exposure --> Validate
SysPorosity --> Validate
SysCrack --> Validate
SysDim --> Validate
Validate[Quality Validator<br/>Check Parameter Ranges<br/>Machine Constraints<br/>Manufacturability]
Validate -->|Invalid| Refine[Refine Modifications]
Refine --> Edit
Validate -->|Valid| Merge[Merge All Modifications<br/>Apply to Scan Points]
Merge --> JSONUpdate[Update JSON Structure<br/>Modified Parameters<br/>New Artifacts]
JSONUpdate --> Generate[Build File Generator<br/>Convert JSON → .slm]
Generate --> Output([Output: Modified .slm File])
style Start fill:#e1f5ff,stroke:#01579b,stroke-width:2px
style Output fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style Defect fill:#fff3e0,stroke:#e65100,stroke-width:2px
style Param fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
style Quality fill:#fce4ec,stroke:#880e4f,stroke-width:2px
style Validate fill:#fff9c4,stroke:#f57f17,stroke-width:2px
1. ⚡ Precision Defect Introduction
- Advanced build file editing capabilities enabling controlled defect generation at specific spatial coordinates
- Workflow:
.slm → JSON/Edit → .slm - Features:
- Coordinate-based defect placement (x, y, z, radius)
- Multiple defect types (porosity, cracks, dimensional deviations)
- Parameter-controlled defect characteristics
2. ⚡ Process Parameter Manipulation
- Granular modification of laser power, scan speed, and exposure parameters at individual scan points
- Features:
- Point-by-point laser power control
- Individual scan speed adjustment
- Per-point exposure parameter modification
3. ⚡ Controlled Quality Variation
- Systematic introduction of porosity, cracks, and dimensional deviations for research and validation purposes
- Features:
- Systematic porosity introduction
- Controlled crack pattern generation
- Dimensional deviation control
- Artificial Artifact Generation: Create controlled defects and features at any location
- Process Parameter Optimization: Modify parameters for specific regions or entire builds
- Research Specimen Generation: Create standardized test specimens for material research
- 10x Faster Iteration: Virtual parameter testing without expensive physical builds
- 100x Cost Reduction: Minimize material waste and machine time
- Parameter sensitivity analysis and optimization
- Quality prediction modeling
- Defect root cause analysis
- Real-time quality monitoring
- Automated defect detection
- Porosity analysis and characterization
- Controlled parameter experiments
- Multi-physics simulation
- Digital twin validation
- World-Class Build File Processing: Leverages libSLM/PySLM for maximum reliability
- 🚀 Revolutionary Build File Editor: Modify build files and generate artificial artifacts for research
- Spatial Resolution: Voxel-level analysis and process control
- Multi-Modal Integration: Unified representation of diverse data sources
- Advanced Analytics: Sophisticated sensitivity analysis and ML capabilities
- Virtual Testing: Controlled experiments without physical resources
- Real-Time Processing: Low-latency data processing and analysis
- Scalable Architecture: Horizontal scaling for growing data volumes
- Research-Ready: Built specifically for additive manufacturing research
We welcome contributions from the research community! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
Comprehensive documentation is available in the docs/ directory:
- System Architecture: Complete system architecture and design principles
- Data Models Reference: Complete reference for all data models, schemas, and relationships across SQL, NoSQL, and data warehouse systems
- Build File Parser: Advanced build file processing with libSLM/PySLM
- 🚀 Build File Editor: Revolutionary tool for modifying build files and generating artificial artifacts
- Sensitivity Analysis: Comprehensive analytics and statistical analysis
- Voxel Visualization: 3D voxel-based visualization and analysis
- Virtual Environment: Virtual testing and simulation capabilities
See our Project Roadmap for planned features and development phases.
- Phase 1: PBF Data Pipeline Optimization
- Phase 2: NoSQL Database Integration
- Phase 3: ML/AI Integration
All Rights Reserved - This project and its contents are proprietary.
Permission Required: You must obtain explicit written permission from the author before using, modifying, or distributing this software or any portion of it. Unauthorized use is prohibited.
For licensing inquiries, please contact the project maintainer through the contact information provided below.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built for PBF-LB/M Research Excellence 🚀