This project is designed to simulate and test the effectiveness of Roboflow AI metrics. The goal is to perform comprehensive testing, compare results with the current situation, and analyze performance across various scenarios.
The project is organized as follows:
- src/: General source code utilities.
- simulation/: The core "Digital Twin" environment and RL agents.
- knowledge_base/: Static database for G&SR rules, network topology, and train specs.
- constraints/: MILP/OR logic for enforcing safety and validity constraints.
- models/: Directory for storing trained RL model checkpoints.
- config/: Configuration files for simulation parameters and environment settings.
- data/: Directory for storing datasets.
raw/: Original, immutable data.processed/: Data that has been cleaned and transformed for the simulation.
- notebooks/: Jupyter notebooks for exploratory data analysis (EDA), prototyping, and visualization of results.
- tests/: Unit and integration tests to ensure the reliability of the simulation code.
- results/: Output directory for simulation logs, generated metrics, and comparison reports.
For a detailed breakdown of why we use Reinforcement Learning (PPO) instead of LLMs (LangChain), and how we handle G&SR Rules, please see ML_STRATEGY.md.
- Python 3.11+
- [List other dependencies here, e.g., NumPy, Pandas, SimPy]
-
Clone the repository:
git clone https://github.com/zer-art/Roboflow cd Roboflow -
Create and Activate Conda Environment: Ensure you have Anaconda or Miniconda installed.
conda create -n roboflow python=3.11 -y conda activate roboflow
-
Install Dependencies:
pip install -r requirements.txt
- Configuration: Modify the configuration files in
config/to set up your simulation parameters. - Run Simulation: Execute the main simulation script (to be implemented in
src/). - Analyze Results: Check the
results/folder for logs and metric reports. Use the notebooks innotebooks/for detailed analysis.
We will rigorously test the system against three core scenarios to demonstrate Efficiency, Safety, and Resilience.
- Situation: A slow Goods train is ahead of a fast Vande Bharat express.
- Human Logic (Baseline): Often keeps the Goods train moving until a major junction, causing the Vande Bharat to trail behind and accumulate delay.
- RailFlow AI: Identifies a small loop line 10km ahead, calculates that the Goods train length fits, and executes a "dynamic overtake" (sidelining the Goods train just in time).
- Target Metric: "Vande Bharat Delay Reduced by 15 mins."
- Situation: Two trains are approaching a single-line section from opposite directions.
- Human Logic: Might accidentally let both enter up to the last signal, causing a standoff where one has to reverse (massive delay).
- RailFlow AI: The MILP (Operations Research) layer detects this conflict 30 minutes in advance and holds one train at the previous station.
- Visual Outcome: "Safety Shield" triggers an override, preventing the deadlock.
- Situation: A signal fails at a major station for 20 minutes.
- Human Logic: Manually re-plans, often causing ripples that last 6 hours across the network.
- RailFlow AI: Re-optimizes the entire schedule in seconds, shuffling 10 other trains to different tracks to absorb the shock.
- Target Metric: "Network Recovery Time: 2 hours (AI) vs 6 hours (Historical)."
RailFlow-AI-2.pdf: Reference document for AI capabilities and metrics (located in root).
Please read the contributing guidelines before submitting pull requests.
[License Name]
Verdict: The proposed architecture (Hybrid Intelligence: MILP + RL) is excellent and represents the state-of-the-art for railway scheduling and dispatching systems.
- Why it works:
- Reinforcement Learning (RL): Excellent for pattern recognition and speed. It learns "heuristics" from historical data to make fast decisions (e.g., "fast passenger train usually passes freight here").
- Operations Research (MILP): Essential for safety and hard constraints. RL can sometimes hallucinate or suggest unsafe moves. The MILP layer acts as a "safety shield" ensuring no G&SR rules are violated.
- Constraint Modeling: Absolutely critical. The exact track layout, signal distances, and gradient rules must be modeled physically.
The user proposed: Data -> Fine-tuning -> Rules Database. Recommended Refined Workflow:
- Data Ingestion (Historical): Load 3-6 months of "Control Charts" (train movements). Clean anomalies.
- Digital Twin Construction (The Simulation):
- This is the most critical step. You must build a simulation strictly based on the G&SR (General & Subsidiary Rules) database.
- The "Static Database" isn't just looked up; it defines the physics of the world.
- Offline RL Training (Imitation Learning):
- Instead of starting from zero, train the RL agent to imitate what the best Section Controllers did in the historical data. This is faster than "fine-tuning".
- Online RL Training (Improvement):
- Let the agent play in the Digital Twin to find better moves than the humans did, optimized for Punctuality and Energy.
- Safety Shield (Deployment):
- Output of RL -> MILP Solver -> Final Command.
For a typical Railway RL environment (e.g., using PPO or DQN):
| Parameter | Recommended Value | Purpose |
|---|---|---|
| Algorithm | PPO (Proximal Policy Optimization) | Stable, handles continuous/discrete action spaces well. |
| Learning Rate | 3e-4 (starts high, decays) |
Step size for updates. |
| Gamma (Discount) | 0.99 |
Prioritizes long-term rewards (avoiding delays hours later). |
| Batch Size | 64 to 256 |
Number of experiences to learn from at once. |
| Replay Buffer | 100,000 steps |
(For DQN/SAC) Stores history to learn from. |
| Reward Function | +1 per train arrival, -0.1 per min delay |
Crucial: Needs careful tuning. |
- simulation/: Source code for the "Digital Twin", RL agents, and constraint validators.
- config/: Configuration files for parameters and settings.
- data/:
rules/: Static G&SR rules and network topology.models/: Trained RL model checkpoints.raw/&processed/: Simulation datasets.
- notebooks/: Experiments and analysis.
- tests/: Unit and integration tests.
- results/**: Logs and metric outputs.
Goal: Use Fine-Tuned LLMs to provide human-readable explanations for the decisions made by the RL Core, without involving them in the critical control loop.
We separate Control (High Precision, Low Latency) from Explanation (High Context, Natural Language).
graph TD
subgraph "Perception Layer"
A[Real-Time State: Signals, Track Occupancy] --> B(Graph Neural Network / Feature Extractor)
Static[Static Constraints: G&SR Rules, Gradients] --> B
end
subgraph "Decision Core (Hybrid Architecture)"
B --> C{RL Agent <br/> Policy Network}
C -->|Proposed Action Logits| D[Action Masking Layer]
Static -->|Valid Moves| D
D -->|Safe Action| E[Environment / Digital Twin]
end
subgraph "Safety Shield (The 'Brake')"
E -.->|Verify Critical Moves| F[MILP Solver / Constraint Checker]
F -->|Override if Unsafe| E
end
subgraph "XAI Co-Pilot"
D -.->|Selected Action| G[Context Engine]
A -.->|State Context| G
G -->|Structured Prompt w/ Counterfactuals| H[LLM Explainer]
H -->|Rationale: 'Held train to prevent deadlock'| I[Controller Dashboard]
end
- Role: Makes the actual decision.
- Tech: Stable-Baselines3 (PPO) + Google OR-Tools.
- Output: Numerical Action (e.g.,
Action_ID: 4-> "Switch Train 105 to Loop Line"). - Latency: < 50ms.
- Role: Converts the "black box" numerical state into a linguistic context.
- Process:
- Takes
Action_ID: 4. - Queries
Knowledge_Base: "What implies Action 4?" -> "Loop Line entry". - Queries
State: "Why?" -> "Train 105 is slower than Train 202 approaching behind".
- Takes
- Output JSON:
{ "action": "Switch to Loop Line", "subject": "Train 105 (Goods)", "conflicts": ["Train 202 (Vande Bharat) approaching on Main Line"], "rule_ref": "G&SR Rule 3.4(b) - Priority Overtaking" }
- Role: Generates the final user-facing text.
- Tech: Fine-Tuned Llama-2-7B or Mistral-7B.
- Input: The JSON from Layer 2.
- Prompt:
"You are a Senior Section Controller. Explain why Train 105 was moved to the Loop Line based on the provided context."
- Output:
"I moved the Goods Train (105) to the loop line to allow the higher-priority Vande Bharat (202) to pass. Keeping 105 on the main line would have caused a 4-minute delay to 202."
We do not use a "General Intelligence" model; we create a Task-Specific Specialist.
- Model: Llama-2-7b-chat (Quantized to 4-bit for speed).
- Dataset Source:
- Rule Logs: Manually crafted pairs of (Situation -> Rule citations).
- Controller Commentaries: Recorded reasons from human controllers (e.g., "Stopped here because signal was red").
- RL Value Function: Validating why a state has high value (e.g., "High reward expected because conflict avoided").
- Training Method: QLoRA (scarcity of compute).
- Objective: Maximize factual consistency with the JSON input (reduce hallucinations).