Multi-Agent Collaboration via Reward Attribution Decomposition
- URL: http://arxiv.org/abs/2010.08531v1
- Date: Fri, 16 Oct 2020 17:42:11 GMT
- Title: Multi-Agent Collaboration via Reward Attribution Decomposition
- Authors: Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph
E. Gonzalez, Yuandong Tian
- Abstract summary: We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge.
CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
- Score: 75.36911959491228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in multi-agent reinforcement learning (MARL) have achieved
super-human performance in games like Quake 3 and Dota 2. Unfortunately, these
techniques require orders-of-magnitude more training rounds than humans and
don't generalize to new agent configurations even on the same game. In this
work, we propose Collaborative Q-learning (CollaQ) that achieves
state-of-the-art performance in the StarCraft multi-agent challenge and
supports ad hoc team play. We first formulate multi-agent collaboration as a
joint optimization on reward assignment and show that each agent has an
approximately optimal policy that decomposes into two parts: one part that only
relies on the agent's own state, and the other part that is related to states
of nearby agents. Following this novel finding, CollaQ decomposes the
Q-function of each agent into a self term and an interactive term, with a
Multi-Agent Reward Attribution (MARA) loss that regularizes the training.
CollaQ is evaluated on various StarCraft maps and shows that it outperforms
existing state-of-the-art techniques (i.e., QMIX, QTRAN, and VDN) by improving
the win rate by 40% with the same number of samples. In the more challenging ad
hoc team play setting (i.e., reweight/add/remove units without re-training or
finetuning), CollaQ outperforms previous SoTA by over 30%.
Related papers
- PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of
Multi-Agent Reinforcement Learning [20.746383793882984]
Training for multi-agent reinforcement learning(MARL) is a time-consuming process.
One drawback is that strategy of each agent in MARL is independent but actually in cooperation.
We propose three simple approaches called Average Sharing(A-PPS), Reward-Scalability Periodically and Partial Personalized Periodically.
arXiv Detail & Related papers (2024-03-05T03:59:01Z) - Leading the Pack: N-player Opponent Shaping [52.682734939786464]
We extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents.
We find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well.
arXiv Detail & Related papers (2023-12-19T20:01:42Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - Exploring the Benefits of Teams in Multiagent Learning [5.334505575267924]
We propose a new model of multiagent teams for reinforcement learning (RL) agents inspired by organizational psychology (OP)
We find that agents divided into teams develop cooperative pro-social policies despite incentives to not cooperate.
Agents are better able to coordinate and learn emergent roles within their teams and achieve higher rewards compared to when the interests of all agents are aligned.
arXiv Detail & Related papers (2022-05-04T21:14:03Z) - Reinforcement Learning Agents in Colonel Blotto [0.0]
We focus on a specific instance of agent-based models, which uses reinforcement learning (RL) to train the agent how to act in its environment.
We find that the RL agent handily beats a single opponent, and still performs quite well when the number of opponents are increased.
We also analyze the RL agent and look at what strategies it has arrived by looking at the actions that it has given the highest and lowest Q-values.
arXiv Detail & Related papers (2022-04-04T16:18:01Z) - Distributed Reinforcement Learning for Cooperative Multi-Robot Object
Manipulation [53.262360083572005]
We consider solving a cooperative multi-robot object manipulation task using reinforcement learning (RL)
We propose two distributed multi-agent RL approaches: distributed approximate RL (DA-RL) and game-theoretic RL (GT-RL)
Although we focus on a small system of two agents in this paper, both DA-RL and GT-RL apply to general multi-agent systems, and are expected to scale well to large systems.
arXiv Detail & Related papers (2020-03-21T00:43:54Z) - "Other-Play" for Zero-Shot Coordination [21.607428852157273]
Other-play learning algorithm enhances self-play by looking for more robust strategies.
We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents.
arXiv Detail & Related papers (2020-03-06T00:39:37Z) - On Emergent Communication in Competitive Multi-Agent Teams [116.95067289206919]
We investigate whether competition for performance from an external, similar agent team could act as a social influence.
Our results show that an external competitive influence leads to improved accuracy and generalization, as well as faster emergence of communicative languages.
arXiv Detail & Related papers (2020-03-04T01:14:27Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.