ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward
- URL: http://arxiv.org/abs/2210.04365v1
- Date: Sun, 9 Oct 2022 22:24:44 GMT
- Title: ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward
- Authors: Zixian Ma, Rose Wang, Li Fei-Fei, Michael Bernstein, Ranjay Krishna
- Abstract summary: We propose a self-supervised intrinsic reward ELIGN - expectation alignment.
Similar to how animals collaborate in a decentralized manner with those in their vicinity, agents trained with expectation alignment learn behaviors that match their neighbors' expectations.
We show that agent coordination improves through expectation alignment because agents learn to divide tasks amongst themselves, break coordination symmetries, and confuse adversaries.
- Score: 29.737986509769808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern multi-agent reinforcement learning frameworks rely on centralized
training and reward shaping to perform well. However, centralized training and
dense rewards are not readily available in the real world. Current multi-agent
algorithms struggle to learn in the alternative setup of decentralized training
or sparse rewards. To address these issues, we propose a self-supervised
intrinsic reward ELIGN - expectation alignment - inspired by the
self-organization principle in Zoology. Similar to how animals collaborate in a
decentralized manner with those in their vicinity, agents trained with
expectation alignment learn behaviors that match their neighbors' expectations.
This allows the agents to learn collaborative behaviors without any external
reward or centralized training. We demonstrate the efficacy of our approach
across 6 tasks in the multi-agent particle and the complex Google Research
football environments, comparing ELIGN to sparse and curiosity-based intrinsic
rewards. When the number of agents increases, ELIGN scales well in all
multi-agent tasks except for one where agents have different capabilities. We
show that agent coordination improves through expectation alignment because
agents learn to divide tasks amongst themselves, break coordination symmetries,
and confuse adversaries. These results identify tasks where expectation
alignment is a more useful strategy than curiosity-driven exploration for
multi-agent coordination, enabling agents to do zero-shot coordination.
Related papers
- ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - AgentVerse: Facilitating Multi-Agent Collaboration and Exploring
Emergent Behaviors [93.38830440346783]
We propose a multi-agent framework framework that can collaboratively adjust its composition as a greater-than-the-sum-of-its-parts system.
Our experiments demonstrate that framework framework can effectively deploy multi-agent groups that outperform a single agent.
In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups.
arXiv Detail & Related papers (2023-08-21T16:47:11Z) - Consensus Learning for Cooperative Multi-Agent Reinforcement Learning [12.74348597962689]
We propose consensus learning for cooperative multi-agent reinforcement learning.
We feed the inferred consensus as an explicit input to the network of agents.
Our proposed method can be extended to various multi-agent reinforcement learning algorithms.
arXiv Detail & Related papers (2022-06-06T12:43:07Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Decentralized Cooperative Multi-Agent Reinforcement Learning with
Exploration [35.75029940279768]
We study multi-agent reinforcement learning in the most basic cooperative setting -- Markov teams.
We propose an algorithm in which each agent independently runs a stage-based V-learning style algorithm.
We show that the agents can learn an $epsilon$-approximate Nash equilibrium policy in at most $proptowidetildeO (1/epsilon4)$ episodes.
arXiv Detail & Related papers (2021-10-12T02:45:12Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Two-stage training algorithm for AI robot soccer [2.0757564643017092]
Two-stage heterogeneous centralized training is proposed to improve the learning performance of heterogeneous agents.
The proposed method is applied to 5 versus 5 AI robot soccer for validation.
arXiv Detail & Related papers (2021-04-13T04:24:13Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.