Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to
Cooperative MARL
- URL: http://arxiv.org/abs/2208.01682v1
- Date: Tue, 2 Aug 2022 18:16:42 GMT
- Title: Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to
Cooperative MARL
- Authors: Jakub Grudzien Kuba, Xidong Feng, Shiyao Ding, Hao Dong, Jun Wang,
Yaodong Yang
- Abstract summary: Heterogeneous-Agent Mirror Learning (HAML) provides a general template for MARL algorithmic designs.
We prove that algorithms derived from the HAML template satisfy the desired properties of the monotonic improvement of the joint reward.
We propose HAML extensions of two well-known RL algorithms, HAA2C (for A2C) and HADDPG (for DDPG)
- Score: 10.681450002239355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The necessity for cooperation among intelligent machines has popularised
cooperative multi-agent reinforcement learning (MARL) in the artificial
intelligence (AI) research community. However, many research endeavors have
been focused on developing practical MARL algorithms whose effectiveness has
been studied only empirically, thereby lacking theoretical guarantees. As
recent studies have revealed, MARL methods often achieve performance that is
unstable in terms of reward monotonicity or suboptimal at convergence. To
resolve these issues, in this paper, we introduce a novel framework named
Heterogeneous-Agent Mirror Learning (HAML) that provides a general template for
MARL algorithmic designs. We prove that algorithms derived from the HAML
template satisfy the desired properties of the monotonic improvement of the
joint reward and the convergence to Nash equilibrium. We verify the
practicality of HAML by proving that the current state-of-the-art cooperative
MARL algorithms, HATRPO and HAPPO, are in fact HAML instances. Next, as a
natural outcome of our theory, we propose HAML extensions of two well-known RL
algorithms, HAA2C (for A2C) and HADDPG (for DDPG), and demonstrate their
effectiveness against strong baselines on StarCraftII and Multi-Agent MuJoCo
tasks.
Related papers
- Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS)
For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium.
We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z) - Sample-Efficient Multi-Agent RL: An Optimization Perspective [103.35353196535544]
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.
We introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs.
We show that our algorithm provides comparable sublinear regret to the existing works.
arXiv Detail & Related papers (2023-10-10T01:39:04Z) - Maximum Entropy Heterogeneous-Agent Reinforcement Learning [47.652866966384586]
Multi-agent reinforcement learning (MARL) has been shown effective for cooperative games in recent years.
We propose a unified framework for learning emphstochastic policies to resolve these issues.
Based on the MaxEnt framework, we propose Heterogeneous-Agent Soft Actor-Critic (HASAC) algorithm.
arXiv Detail & Related papers (2023-06-19T06:22:02Z) - Heterogeneous-Agent Reinforcement Learning [16.796016254366524]
We propose Heterogeneous-Agent Reinforcement Learning (HARL) algorithms to achieve effective cooperation in the general heterogeneous-agent setting.
Central to our findings are the multi-agent advantage decomposition lemma and the sequential update scheme.
We prove that all algorithms derived from HAML inherently enjoy monotonic improvement of joint return and convergence to Nash Equilibrium.
arXiv Detail & Related papers (2023-04-19T05:08:02Z) - Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement
Learning [17.957644784944755]
We propose a novel certification method for c-MARLs to determine actions with guaranteed certified bounds.
We empirically show that our certification bounds are much tighter than state-of-the-art RL certification solutions.
Our method produces meaningful guaranteed robustness for all models and environments.
arXiv Detail & Related papers (2022-12-22T14:36:27Z) - Relational Reasoning via Set Transformers: Provable Efficiency and
Applications to MARL [154.13105285663656]
A cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications.
Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works.
We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents.
arXiv Detail & Related papers (2022-09-20T16:42:59Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.