-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Hello, I'm new to MARL and am trying to create more environments for multi-agent competitive games (pursue/evade, reach/avoid, etc.), like the dogfight example except each agent may not have the same objectives. I've found structuring the rewards to be very difficult and my models just don't seem to converge to the behaviours I am expecting. Even when trying to copy the structure of the provided multi-agent environments (which do work very well) as closely as I can, the agents just don't seem to learn. I was wondering if you had any advice for how the rewards of these environments were designed e.g. how the shaping rewards were made and how you balanced them relative to the sparse rewards.
Also I am currently training the agents using a variant of Self Play with stablebaselines3 (only one agent is learning at a time against the other agent's fixed policy), but if you have any recommendations for changing my strategy I would be open to changing it too.
Thanks!