An intelligent traffic signal control system using Deep Reinforcement Learning to optimize traffic flow and reduce vehicle waiting times. This project implements multiple RL approaches from scratch, including Q-Learning, Deep Q-Network (DQN), and CNN-based DQN with visual state representation.
- Built from scratch: All algorithms implemented from the ground up without relying on RL frameworks
- Custom SUMO network: Traffic network designed and created manually using SUMO netedit
- Multiple RL approaches: Comparison between Fixed Timing, Q-Learning, and Deep Q-Learning
- Visual state representation: CNN-based DQN using simulation screenshots as state input
- Experience Replay & Target Network: Advanced DQN techniques for stable training
- Comprehensive evaluation: Metrics include cumulative reward, average delay, and queue length
| Algorithm | State Representation | Description |
|---|---|---|
| Fixed Timing (Baseline) | - | Traditional fixed-cycle traffic control |
| Q-Learning | Discrete (queue lengths + phase) | Tabular RL with epsilon-greedy exploration |
| Deep Q-Network (DQN) | Visual (128x128x4 frames) | CNN-based function approximation |
| DQN with Target Network | Visual (128x128x4 frames) | Improved stability with target network |
traffic-rl-control/
βββ src/
β βββ agents/
β β βββ fixed_timing.py # Fixed timing baseline
β β βββ q_learning.py # Tabular Q-Learning agent
β β βββ dqn.py # Deep Q-Network agent (CNN-based)
β βββ environment/ # SUMO environment wrapper
β βββ utils/
β βββ visualization.py # Plotting utilities
βββ config/
β βββ sumo/
β βββ network.net.xml # Traffic network definition
β βββ routes.rou.xml # Vehicle routes
β βββ detectors.add.xml # Lane area detectors (E2)
β βββ simulation.sumocfg # SUMO configuration
βββ scripts/
β βββ run_experiments.py # Run all experiments
β βββ evaluate_model.py # Evaluate trained models
βββ docs/
β βββ ARCHITECTURE.md # System architecture
β βββ GETTING_STARTED.md # Setup guide
βββ model/ # Saved DQN model checkpoints (200 epochs)
βββ result/ # Training logs and metrics
βββ saved_plots/ # Generated visualizations
βββ README.md
Input: 128x128x4 (4 stacked grayscale frames)
β
Conv2D(32, 8x8, stride=4) + ReLU
β
Conv2D(64, 4x4, stride=2) + ReLU
β
Flatten
β
Dense(512) + ReLU
β
Dense(512) + ReLU
β
Dense(2) - Q-values for [Keep Phase, Switch Phase]
Visual State (DQN):
- Screenshot of SUMO GUI captured at each step
- Preprocessed: RGB β Grayscale β Resize to 128x128
- Frame stacking: 4 consecutive frames for temporal information
Discrete State (Q-Learning):
- Queue lengths from 6 lane area detectors
- Current traffic light phase index
reward = previous_avg_delay - current_avg_delayThe agent is rewarded for reducing the average cumulative waiting time of vehicles.
| Parameter | Value | Description |
|---|---|---|
| Learning Rate (Ξ±) | 0.00001 | Adam optimizer learning rate |
| Discount Factor (Ξ³) | 0.99 | Future reward discount |
| Exploration Rate (Ξ΅) | 0.1 β 0.01 | Epsilon decay for exploration |
| Replay Buffer Size | 2000 | Experience replay memory |
| Batch Size | 32 | Mini-batch for training |
| Target Update Freq | 10 | Steps between target network updates |
- Python 3.8+
- SUMO (Simulation of Urban MObility) - Installation Guide
- Required Python packages:
pip install tensorflow numpy matplotlib pandas pillow traci# Set SUMO_HOME environment variable
# Windows
set SUMO_HOME=C:\Program Files (x86)\Eclipse\Sumo
# Linux/Mac
export SUMO_HOME=/usr/share/sumo# Run Fixed Timing Baseline
python src/agents/baseline.py
# Run Q-Learning Agent
python src/agents/q_learning.py
# Run Deep Q-Network Agent
python src/agents/dqn_agent.py| Method | Avg. Delay Reduction | Queue Length | Training Time |
|---|---|---|---|
| Fixed Timing | Baseline | ~290 vehicles | - |
| Q-Learning | ~15% | ~250 vehicles | ~30 min |
| DQN | ~25% | ~220 vehicles | ~2 hours |
The training process shows:
- Cumulative Reward: Increasing trend indicating learning progress
- Average Delay: Decreasing trend showing traffic optimization
- Queue Length: Reduction in vehicle accumulation at intersections
- Designed intersection layout using SUMO netedit
- Configured realistic traffic flow patterns
- Added lane area detectors (E2) for queue measurement
- Action 0: Keep current phase (maintain green light direction)
- Action 1: Switch to next phase (change traffic direction)
- Yellow light transition handled automatically (4 seconds)
- Online Training: Learn while interacting with environment
- Experience Replay: Sample from replay buffer for stable learning
- Evaluation Mode: Test trained model without exploration
| File | Description |
|---|---|
src/agents/fixed_timing.py |
Fixed Timing baseline implementation |
src/agents/q_learning.py |
Q-Learning with tabular state representation |
src/agents/dqn.py |
Deep Q-Network with CNN architecture |
src/utils/visualization.py |
Visualization and result plotting |
| File | Description |
|---|---|
traci5.FT.py |
Fixed Timing baseline (original) |
traci6.QL.py |
Q-Learning implementation (original) |
traci7.DQL.py |
Deep Q-Network implementation (original) |
Phuoc_ne.py |
DQN with Target Network |
policy_modify_action.py |
Policy-based action modification |
baseline.py |
Baseline comparison experiments |
- Reinforcement Learning: Q-Learning, Deep Q-Learning, Experience Replay, Target Networks
- Deep Learning: CNN architecture design, TensorFlow/Keras implementation
- Traffic Simulation: SUMO configuration, TraCI API integration
- Software Engineering: Modular code design, experiment tracking, visualization
- Research: Algorithm comparison, hyperparameter tuning, performance analysis
- Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature.
- SUMO Documentation: https://sumo.dlr.de/docs/
- TraCI Python API: https://sumo.dlr.de/docs/TraCI/Interfacing_TraCI_from_Python.html
[Your Name] - AI/ML Engineer
π University of Information Technology (UIT) - Vietnam National University HCMC
π Course: Artificial Intelligence (CS106)
π§ Email: [your.email@example.com]
π LinkedIn: linkedin.com/in/yourprofile
π» GitHub: github.com/yourusername
This project is licensed under the MIT License - see the LICENSE file for details.
β If you find this project helpful, please give it a star!