Heterogeneous-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2304.09870v2
- Date: Thu, 28 Dec 2023 10:18:54 GMT
- Title: Heterogeneous-Agent Reinforcement Learning
- Authors: Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji,
and Yaodong Yang
- Abstract summary: We propose Heterogeneous-Agent Reinforcement Learning (HARL) algorithms to achieve effective cooperation in the general heterogeneous-agent setting.
Central to our findings are the multi-agent advantage decomposition lemma and the sequential update scheme.
We prove that all algorithms derived from HAML inherently enjoy monotonic improvement of joint return and convergence to Nash Equilibrium.
- Score: 16.796016254366524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The necessity for cooperation among intelligent machines has popularised
cooperative multi-agent reinforcement learning (MARL) in AI research. However,
many research endeavours heavily rely on parameter sharing among agents, which
confines them to only homogeneous-agent setting and leads to training
instability and lack of convergence guarantees. To achieve effective
cooperation in the general heterogeneous-agent setting, we propose
Heterogeneous-Agent Reinforcement Learning (HARL) algorithms that resolve the
aforementioned issues. Central to our findings are the multi-agent advantage
decomposition lemma and the sequential update scheme. Based on these, we
develop the provably correct Heterogeneous-Agent Trust Region Learning (HATRL),
and derive HATRPO and HAPPO by tractable approximations. Furthermore, we
discover a novel framework named Heterogeneous-Agent Mirror Learning (HAML),
which strengthens theoretical guarantees for HATRPO and HAPPO and provides a
general template for cooperative MARL algorithmic designs. We prove that all
algorithms derived from HAML inherently enjoy monotonic improvement of joint
return and convergence to Nash Equilibrium. As its natural outcome, HAML
validates more novel algorithms in addition to HATRPO and HAPPO, including
HAA2C, HADDPG, and HATD3, which generally outperform their existing
MA-counterparts. We comprehensively test HARL algorithms on six challenging
benchmarks and demonstrate their superior effectiveness and stability for
coordinating heterogeneous agents compared to strong baselines such as MAPPO
and QMIX.
Related papers
- Sample-Efficient Multi-Agent RL: An Optimization Perspective [103.35353196535544]
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.
We introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs.
We show that our algorithm provides comparable sublinear regret to the existing works.
arXiv Detail & Related papers (2023-10-10T01:39:04Z) - Deep Multi-Agent Reinforcement Learning for Decentralized Active
Hypothesis Testing [11.639503711252663]
We tackle the multi-agent active hypothesis testing (AHT) problem by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning.
We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance.
arXiv Detail & Related papers (2023-09-14T01:18:04Z) - Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent
Policy Optimization [1.5501208213584152]
This paper presents an extension of the Mirror Descent method to overcome challenges in cooperative Multi-Agent Reinforcement Learning (MARL) settings.
The proposed Heterogeneous-Agent Mirror Descent Policy Optimization (HAMDPO) algorithm utilizes the multi-agent advantage decomposition lemma.
We evaluate HAMDPO on Multi-Agent MuJoCo and StarCraftII tasks, demonstrating its superiority over state-of-the-art algorithms.
arXiv Detail & Related papers (2023-08-13T10:18:10Z) - Maximum Entropy Heterogeneous-Agent Reinforcement Learning [47.652866966384586]
Multi-agent reinforcement learning (MARL) has been shown effective for cooperative games in recent years.
We propose a unified framework for learning emphstochastic policies to resolve these issues.
Based on the MaxEnt framework, we propose Heterogeneous-Agent Soft Actor-Critic (HASAC) algorithm.
arXiv Detail & Related papers (2023-06-19T06:22:02Z) - Relational Reasoning via Set Transformers: Provable Efficiency and
Applications to MARL [154.13105285663656]
A cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications.
Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works.
We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents.
arXiv Detail & Related papers (2022-09-20T16:42:59Z) - Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to
Cooperative MARL [10.681450002239355]
Heterogeneous-Agent Mirror Learning (HAML) provides a general template for MARL algorithmic designs.
We prove that algorithms derived from the HAML template satisfy the desired properties of the monotonic improvement of the joint reward.
We propose HAML extensions of two well-known RL algorithms, HAA2C (for A2C) and HADDPG (for DDPG)
arXiv Detail & Related papers (2022-08-02T18:16:42Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning [25.027143431992755]
Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks.
Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply.
In this paper, we extend the theory of trust region learning to MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme.
Based on these, we develop Heterogeneous-Agent Trust Region Policy optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy optimisation (
arXiv Detail & Related papers (2021-09-23T09:44:35Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - QTRAN++: Improved Value Transformation for Cooperative Multi-Agent
Reinforcement Learning [70.382101956278]
QTRAN is a reinforcement learning algorithm capable of learning the largest class of joint-action value functions.
Despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments.
We propose a substantially improved version, coined QTRAN++.
arXiv Detail & Related papers (2020-06-22T05:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.