Skip to content

Reward structure for custom MARL environments #91

@nathanDuncan

Description

@nathanDuncan

Hello, I'm new to MARL and am trying to create more environments for multi-agent competitive games (pursue/evade, reach/avoid, etc.), like the dogfight example except each agent may not have the same objectives. I've found structuring the rewards to be very difficult and my models just don't seem to converge to the behaviours I am expecting. Even when trying to copy the structure of the provided multi-agent environments (which do work very well) as closely as I can, the agents just don't seem to learn. I was wondering if you had any advice for how the rewards of these environments were designed e.g. how the shaping rewards were made and how you balanced them relative to the sparse rewards.
Also I am currently training the agents using a variant of Self Play with stablebaselines3 (only one agent is learning at a time against the other agent's fixed policy), but if you have any recommendations for changing my strategy I would be open to changing it too.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions