Stateful active facilitator: Coordination and Environmental
Heterogeneity in Cooperative Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2210.03022v3
- Date: Fri, 6 Oct 2023 22:34:35 GMT
- Title: Stateful active facilitator: Coordination and Environmental
Heterogeneity in Cooperative Multi-Agent Reinforcement Learning
- Authors: Dianbo Liu, Vedant Shah, Oussama Boussif, Cristian Meo, Anirudh Goyal,
Tianmin Shu, Michael Mozer, Nicolas Heess, Yoshua Bengio
- Abstract summary: We formalize the notions of coordination level and heterogeneity level of an environment.
We present HECOGrid, a suite of multi-agent environments that facilitates empirical evaluation of different MARL approaches.
We propose a Training Decentralized Execution learning approach that enables agents to work efficiently in high-coordination and high-heterogeneity environments.
- Score: 71.53769213321202
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In cooperative multi-agent reinforcement learning, a team of agents works
together to achieve a common goal. Different environments or tasks may require
varying degrees of coordination among agents in order to achieve the goal in an
optimal way. The nature of coordination will depend on the properties of the
environment -- its spatial layout, distribution of obstacles, dynamics, etc. We
term this variation of properties within an environment as heterogeneity.
Existing literature has not sufficiently addressed the fact that different
environments may have different levels of heterogeneity. We formalize the
notions of coordination level and heterogeneity level of an environment and
present HECOGrid, a suite of multi-agent RL environments that facilitates
empirical evaluation of different MARL approaches across different levels of
coordination and environmental heterogeneity by providing a quantitative
control over coordination and heterogeneity levels of the environment. Further,
we propose a Centralized Training Decentralized Execution learning approach
called Stateful Active Facilitator (SAF) that enables agents to work
efficiently in high-coordination and high-heterogeneity environments through a
differentiable and shared knowledge source used during training and dynamic
selection from a shared pool of policies. We evaluate SAF and compare its
performance against baselines IPPO and MAPPO on HECOGrid. Our results show that
SAF consistently outperforms the baselines across different tasks and different
heterogeneity and coordination levels. We release the code for HECOGrid as well
as all our experiments.
Related papers
- Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision-Making [2.4555276449137042]
Robust coordination skills enable agents to operate cohesively in shared environments, together towards a common goal and, ideally, individually without hindering each other's progress.
This paper presents Coordinated QMIX, a novel training framework for decentralized agents that enables emergent coordination through flexible policies, allowing at the same time independent decision-making at individual level.
arXiv Detail & Related papers (2023-08-21T13:45:44Z) - Adaptive Coordination in Social Embodied Rearrangement [49.35582108902819]
We study zero-shot coordination (ZSC) in this task, where an agent collaborates with a new partner, emulating a scenario where a robot collaborates with a new human partner.
We propose Behavior Diversity Play (BDP), a novel ZSC approach that encourages diversity through a discriminability objective.
Our results demonstrate that BDP learns adaptive agents that can tackle visual coordination, and zero-shot generalize to new partners in unseen environments, achieving 35% higher success and 32% higher efficiency compared to baselines.
arXiv Detail & Related papers (2023-05-31T18:05:51Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Diversity Induced Environment Design via Self-Play [9.172096093540357]
We propose a task-agnostic method to identify observed/hidden states that are representative of a given level.
The outcome of this method is then utilized to characterize the diversity between two levels, which as we show can be crucial to effective performance.
In addition, to improve sampling efficiency, we incorporate the self-play technique that allows the environment generator to automatically generate environments that are of great benefit to the training agent.
arXiv Detail & Related papers (2023-02-04T07:31:36Z) - Parallel Best Arm Identification in Heterogeneous Environments [8.915120653822433]
We study the tradeoffs between the time and the number of communication rounds of the best arm identification problem in the heterogeneous collaborative learning model.
By proving almost tight upper and lower bounds, we show that collaborative learning in the heterogeneous setting is inherently more difficult than that in the homogeneous setting.
arXiv Detail & Related papers (2022-07-16T21:06:26Z) - Normative Disagreement as a Challenge for Cooperative AI [56.34005280792013]
We argue that typical cooperation-inducing learning algorithms fail to cooperate in bargaining problems.
We develop a class of norm-adaptive policies and show in experiments that these significantly increase cooperation.
arXiv Detail & Related papers (2021-11-27T11:37:42Z) - Non-local Policy Optimization via Diversity-regularized Collaborative
Exploration [45.997521480637836]
We propose a novel non-local policy optimization framework called Diversity-regularized Collaborative Exploration (DiCE)
DiCE utilizes a group of heterogeneous agents to explore the environment simultaneously and share the collected experiences.
We implement the framework in both on-policy and off-policy settings and the experimental results show that DiCE can achieve substantial improvement over the baselines.
arXiv Detail & Related papers (2020-06-14T03:31:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.