Reward structure for custom MARL environments

Hello, I'm new to MARL and am trying to create more environments for multi-agent competitive games (pursue/evade, reach/avoid, etc.), like the dogfight example except each agent may not have the same objectives. I've found structuring the rewards to be very difficult and my models just don't seem to converge to the behaviours I am expecting. Even when trying to copy the structure of the provided multi-agent environments (which do work very well) as closely as I can, the agents just don't seem to learn. I was wondering if you had any advice for how the rewards of these environments were designed e.g. how the shaping rewards were made and how you balanced them relative to the sparse rewards. 
Also I am currently training the agents using a variant of Self Play with stablebaselines3 (only one agent is learning at a time against the other agent's fixed policy), but if you have any recommendations for changing my strategy I would be open to changing it too.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward structure for custom MARL environments #91

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reward structure for custom MARL environments #91

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions