ACE: Cooperative Multi-agent Q-learning with Bidirectional
Action-Dependency
- URL: http://arxiv.org/abs/2211.16068v1
- Date: Tue, 29 Nov 2022 10:22:55 GMT
- Title: ACE: Cooperative Multi-agent Q-learning with Bidirectional
Action-Dependency
- Authors: Chuming Li, Jie Liu, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong
Yang, Yu Liu, Wanli Ouyang
- Abstract summary: Multi-agent reinforcement learning (MARL) suffers from the non-stationarity problem.
In this paper, we propose bidirectional action-dependent Q-learning (ACE)
ACE outperforms the state-of-the-art algorithms on Google Research Football and StarCraft Multi-Agent Challenge.
- Score: 65.28061634546577
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-agent reinforcement learning (MARL) suffers from the non-stationarity
problem, which is the ever-changing targets at every iteration when multiple
agents update their policies at the same time. Starting from first principle,
in this paper, we manage to solve the non-stationarity problem by proposing
bidirectional action-dependent Q-learning (ACE). Central to the development of
ACE is the sequential decision-making process wherein only one agent is allowed
to take action at one time. Within this process, each agent maximizes its value
function given the actions taken by the preceding agents at the inference
stage. In the learning phase, each agent minimizes the TD error that is
dependent on how the subsequent agents have reacted to their chosen action.
Given the design of bidirectional dependency, ACE effectively turns a
multiagent MDP into a single-agent MDP. We implement the ACE framework by
identifying the proper network representation to formulate the action
dependency, so that the sequential decision process is computed implicitly in
one forward pass. To validate ACE, we compare it with strong baselines on two
MARL benchmarks. Empirical experiments demonstrate that ACE outperforms the
state-of-the-art algorithms on Google Research Football and StarCraft
Multi-Agent Challenge by a large margin. In particular, on SMAC tasks, ACE
achieves 100% success rate on almost all the hard and super-hard maps. We
further study extensive research problems regarding ACE, including extension,
generalization, and practicability. Code is made available to facilitate
further research.
Related papers
- Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space [22.535906675532196]
In a multi-agent system, action semantics indicates the different influences of agents' actions toward other entities.
Previous multi-agent reinforcement learning (MARL) algorithms apply global parameter-sharing across different types of heterogeneous agents.
We introduce the Unified Action Space (UAS) to fulfill the requirement.
arXiv Detail & Related papers (2024-08-14T09:15:11Z) - Deep Multi-Agent Reinforcement Learning for Decentralized Active
Hypothesis Testing [11.639503711252663]
We tackle the multi-agent active hypothesis testing (AHT) problem by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning.
We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance.
arXiv Detail & Related papers (2023-09-14T01:18:04Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Asynchronous Multi-Agent Reinforcement Learning for Efficient Real-Time
Multi-Robot Cooperative Exploration [16.681164058779146]
We consider the problem of cooperative exploration where multiple robots need to cooperatively explore an unknown region as fast as possible.
Existing MARL-based methods adopt action-making steps as the metric for exploration efficiency by assuming all the agents are acting in a fully synchronous manner.
We propose an asynchronous MARL solution, Asynchronous Coordination Explorer (ACE), to tackle this real-world challenge.
arXiv Detail & Related papers (2023-01-09T14:53:38Z) - Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under
Partial Observability [4.111899441919164]
State-of-the-art multi-agent reinforcement learning (MARL) methods have provided promising solutions to a variety of complex problems.
We first propose a group of value-based RL approaches for MacDec-POMDPs.
We formulate a set of macro-action-based policy gradient algorithms under the three training paradigms.
arXiv Detail & Related papers (2022-09-20T21:13:51Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - On the Use and Misuse of Absorbing States in Multi-agent Reinforcement
Learning [55.95253619768565]
Current MARL algorithms assume that the number of agents within a group remains fixed throughout an experiment.
In many practical problems, an agent may terminate before their teammates.
We present a novel architecture for an existing state-of-the-art MARL algorithm which uses attention instead of a fully connected layer with absorbing states.
arXiv Detail & Related papers (2021-11-10T23:45:08Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.