MARLEM: Multi-Agent Reinforcement Learning Simulation Framework for Implicit Cooperation in Decentralized Local Energy Markets
This project implements a decentralized multi-agent reinforcement learning (MARL) framework for local energy markets, focusing on implicit cooperation between agents to maintain grid balance while maximizing economic efficiency.
- Introduction
- Main Features
- Project Structure
- Installation
- Quick Start
- Case Studies
- Code Structure and Design
- Analysis Tools
- Documentation
- Contributing
- License
This framework enables researchers and practitioners to:
- Simulate decentralized local energy markets with multiple agents trading energy through peer-to-peer mechanisms
- Train MARL agents using various algorithms (PPO, SAC, APPO) in different training paradigms (CTCE, CTDE, DTDE)
- Evaluate market mechanisms through comprehensive case studies covering market design, agent heterogeneity, grid constraints, and coordination strategies
- Analyze coordination effectiveness through implicit cooperation models that enable agents to coordinate without explicit communication
- Visualize market outcomes with comprehensive analytics and visualization tools
The framework is designed to support research on decentralized energy markets, helping answer questions about market design, agent behavior, grid stability, and the effectiveness of implicit coordination mechanisms.
This PhD research project focuses on developing an implicit cooperation model for coordinating decentralized agents and their distributed energy resources (DERs) in day-ahead and intraday local energy markets (LEMs). The goal is to achieve optimal balance between energy supply and demand through a decentralized approach, where agents make decisions based on limited local information without centralized control.
Main Research Question: How does implicit cooperation, enabled by MARL, improve the management of DERs to maximize their energy use while ensuring the balance between supply and demand in LEMs?
- Decentralized Market Clearing: Agent-to-agent matching with multiple pricing mechanisms (average, buyer, seller, Nash bargaining, proportional surplus)
- Multi-Agent Reinforcement Learning: Support for CTCE, CTDE, and DTDE training paradigms with multiple RL algorithms
- Implicit Cooperation Model: Privacy-preserving coordination mechanisms that enable agents to coordinate without explicit communication
- Comprehensive Case Studies: Six case studies covering market mechanisms, agent heterogeneity, DSO intervention, grid constraints, battery coordination, and implicit cooperation effectiveness
- Grid Integration: Physical grid modeling with topology constraints, congestion management, and DSO fallback mechanisms
- Scenario Testing: Framework for evaluating system behavior under various market conditions
- Ablation Studies: Tools for analyzing component contributions to system performance
- Visualization Tools: Comprehensive plotting and analytics for market outcomes and agent behaviors
.
├── src/ # Source code
│ ├── agent/ # Agent implementations and policies
│ ├── analytics/ # Metrics and performance analysis
│ ├── coordination/ # Implicit cooperation mechanisms
│ ├── environment/ # Market environment and training
│ ├── grid/ # Grid network modeling
│ ├── market/ # Market mechanisms and clearing
│ ├── profile/ # Energy profile generation
│ └── visualization/ # Plotting and visualization tools
├── cases/ # Case studies and scenarios
├── data/ # Input data (demand, generation, prices)
├── docs/ # Documentation
├── tests/ # Unit and integration tests
└── notebooks/ # Jupyter notebooks for exploration
The project is organized into three conceptual layers:
- Level 0: Platform Marketplace - Base infrastructure for the decentralized energy market
- Level 1: MARL Agents - Decentralized partially observable Markov decision process (Dec-POMDP) implementation
- Level 2: Implicit Cooperation Model - Coordination mechanisms between agents
The step method within the LEM environment manages a sequence of eight distinct phases, from initial agent decision to the final observation and reward feedback (see Fig. \ref{fig:sequence-diagram}).
A detailed breakdown of the simulation process is as follows:
-
Action processing and validation: The workflow begins with the raw action received from the MARL policies. Agents limit each action based on the current generation, demand, and the agent's battery state. Subsequently, the Action Module performs a rule-based validation, ensuring the action's components fall within the predefined market bounds.
-
Order creation and market submission: Validated actions are then formatted into formal orders. This phase culminates in a collection of structured orders ready for market submission.
-
Market clearing and trade execution: The collection of orders is passed to the Market Module which executes its three-stage clearing process. The output of this phase is a list of all executed trades and summary statistics.
-
Energy tracking and battery updates: The list of all executed trades is used to update the physical state of each agent, and grid state.
-
Reward calculation and KPI updates: The Implicit Cooperation Module gathers all necessary data to compute the full suite of system-level KPIs. These KPIs are then passed to the reward function, which calculates the final, composite reward for each agent.
-
State transitions and environment updates: This phase advances the simulation's global state, and the environment's internal clock is incremented. Crucially, agent-level states that depend on the full market outcome are updated based on the results from the Reputation Module.
-
Observation update and information sharing: The Observation Module constructs the next local observation for each agent. This step closes the information loop, providing agents with the feedback necessary for their next decision.
-
Episode management and termination: Finally, the environment checks for termination conditions. If the episode has not concluded, the environment returns the newly generated observations and computed rewards to the MARL algorithm, and the entire workflow repeats for the next trading period.
- Python >= 3.12+
- Miniforge (recommended)
- Git (for cloning the repository)
It is recommended to install Miniforge as an environment to install Python and the necessary libraries. Miniforge is a distribution of Python and the conda package manager, which easily allows for environment configuration and package installation from the conda-forge repository.
During installation, use the recommended options in the Advanced Installation Options.
git clone https://github.com/salazarna/marlem
cd marlemFrom the (base) environment, navigate to the project directory and create the environment:
conda env create --file environment.ymlThis creates a new environment named marlem with all necessary dependencies.
conda activate marlemYou should now see (marlem) in your terminal prompt.
To use the environment with Jupyter notebooks:
python -m ipykernel install --user --name marlem --display-name "marlem"You can verify the installation by running:
python -c "import src; print('Installation successful!')"To remove the Jupyter kernel:
jupyter kernelspec remove marlemTo remove the conda environment:
conda deactivate
conda env remove -n marlemThe simplest way to get started is to run one of the demo case studies:
python cases/run_demo_case_studies.pyThis will execute demonstrative case studies and save results in the downloads/ directory.
To run all comprehensive case studies:
python cases/run_case_studies.pyYou can also run individual case studies programmatically:
from cases import get_case_study
# Get scenarios from Case 1 (Market Mechanisms)
case1_scenarios = get_case_study("case1_market_mechanisms")
# Iterate through scenarios
for scenario_name, config in case1_scenarios.items():
print(f"Scenario: {scenario_name}")
print(f"Agents: {len(config.agents)}")
print(f"Market mechanism: {config.market.price_mechanism}")To train agents using reinforcement learning:
from src.environment.train import RLTrainer, TrainingMode, RLAlgorithm
from cases import get_case_study
# Get a scenario
scenarios = get_case_study("case1_market_mechanisms")
scenario_name, config = next(iter(scenarios.items()))
# Train with CTDE (Centralized Training, Decentralized Execution)
trainer = RLTrainer(
env_config=config,
algorithm=RLAlgorithm.PPO,
training=TrainingMode.CTDE,
iters=100
)
trainer.train()The project includes six comprehensive case studies designed to systematically answer the research questions:
Case 1: Market Mechanism Comparison (🔗)
Objective: Compare different clearing mechanisms to identify optimal pricing approaches.
- Scenarios: 6 scenarios testing different clearing mechanisms
- Key Variables: Market clearing mechanism (AVERAGE, BUYER, SELLER, NASH_BARGAINING, etc.)
Case 2: Agent Heterogeneity & Market Power (🔗)
Objective: Analyze how agent size differences and market concentration affect coordination.
- Scenarios: 4 market structures (balanced, monopoly, oligopoly, cooperative)
- Key Variables: Agent capacity distribution, market concentration
Case 3: DSO Intervention Strategies (🔗)
Objective: Study how different DSO regulatory approaches affect market participation.
- Scenarios: 4 regulatory approaches (permissive, moderate, strict, dynamic)
- Key Variables: DSO intervention thresholds, penalty structures
Case 4: Grid Topology & Congestion Effects (🔗)
Objective: Examine how grid physical constraints affect market outcomes.
- Scenarios: 12 combinations of 4 topologies × 3 capacity levels
- Key Variables: Grid topology (STAR, MESH, RING, TREE), capacity constraints
Case 5: Battery Storage Coordination (🔗)
Objective: Analyze how different storage configurations enhance coordination.
- Scenarios: 6 storage deployment strategies
- Key Variables: Battery deployment, capacity ratios, efficiency levels
Case 6: Implicit Cooperation Effectiveness (🔗)
Objective: Directly test how implicit cooperation improves DER management.
- Scenarios: 6 scenarios testing coordination effectiveness
- Key Variables: Coordination level (none, implicit, explicit), cooperation conditions
For detailed information about each case study, see cases/README.md.
Comprehensive testing ensures code quality:
- Unit Tests: Test individual components in isolation (
tests/test_*.py) - Integration Tests: Verify interactions between components
- Scenario Tests: Evaluate system behavior under different conditions
- Consistency Tests: Ensure consistent behavior across different implementations
Run tests with:
python -m pytest tests/The project includes comprehensive tools for analyzing the behavior and performance of the system:
The framework provides several analytics modules for performance evaluation:
- Coordination Metrics (
src/analytics/coordination_metrics.py): Measures coordination effectiveness between agents - DSO Metrics (
src/analytics/dso_metrics.py): Tracks DSO intervention and grid stability metrics - Grid Metrics (
src/analytics/grid_metrics.py): Evaluates grid performance and constraint violations - Market Analytics (
src/analytics/market_analytics.py): Analyzes market outcomes, prices, and trading volumes
The framework includes visualization tools in src/visualization/ for:
- Market price and volume trends
- Agent behavior and strategy visualization
- Grid state visualization
- Performance comparison across scenarios
Contributions to this project are welcome! When contributing:
- Follow the coding guidelines in
docs/guidelines.md - Ensure all tests pass:
python -m pytest tests/ - Maintain code coverage and documentation
- Follow the project structure and design patterns
For major changes or new features, please open an issue first to discuss the proposed changes.
The original paper describing the methods implemented is:
Salazar-Peña, N., Tabares, A., Gonzalez-Mancera, A., 2025. MARLEM: Multi-Agent Reinforcement Learning Simulation Framework for Implicit Cooperation in Decentralized Local Energy Markets. Journal, Vol. XXX, XXXXXX. URL: <>, doi: <>.
The BibTex entry:
@article{SalazarPena2025,
title = "MARLEM: Multi-Agent Reinforcement Learning Simulation Framework for Implicit Cooperation in Decentralized Local Energy Markets",
journal = "",
volume = "",
pages = "",
year = "",
issn = "",
doi = "",
url = "",
author = "Nelson Salazar-Peña and Alejandra Tabares and Andrés González-Mancera",
keywords = "Local Energy Markets, Multi-Agent Reinforcement Learning, Decentralized Partially Observable Markov Decision Process, Implicit Cooperation, Decentralized Energy Systems, Simulation Framework",
}
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0), as specified in the LICENSE file.
This is a copyleft license that requires that any derivative work be licensed under the same terms as the original work.


