Maximum Entropy Heterogeneous-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2306.10715v4
- Date: Fri, 8 Mar 2024 12:07:10 GMT
- Title: Maximum Entropy Heterogeneous-Agent Reinforcement Learning
- Authors: Jiarong Liu, Yifan Zhong, Siyi Hu, Haobo Fu, Qiang Fu, Xiaojun Chang,
Yaodong Yang
- Abstract summary: Multi-agent reinforcement learning (MARL) has been shown effective for cooperative games in recent years.
We propose a unified framework for learning emphstochastic policies to resolve these issues.
Based on the MaxEnt framework, we propose Heterogeneous-Agent Soft Actor-Critic (HASAC) algorithm.
- Score: 47.652866966384586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent reinforcement learning (MARL) has been shown effective for
cooperative games in recent years. However, existing state-of-the-art methods
face challenges related to sample complexity, training instability, and the
risk of converging to a suboptimal Nash Equilibrium. In this paper, we propose
a unified framework for learning \emph{stochastic} policies to resolve these
issues. We embed cooperative MARL problems into probabilistic graphical models,
from which we derive the maximum entropy (MaxEnt) objective for MARL. Based on
the MaxEnt framework, we propose Heterogeneous-Agent Soft Actor-Critic (HASAC)
algorithm. Theoretically, we prove the monotonic improvement and convergence to
quantal response equilibrium (QRE) properties of HASAC. Furthermore, we
generalize a unified template for MaxEnt algorithmic design named Maximum
Entropy Heterogeneous-Agent Mirror Learning (MEHAML), which provides any
induced method with the same guarantees as HASAC. We evaluate HASAC on six
benchmarks: Bi-DexHands, Multi-Agent MuJoCo, StarCraft Multi-Agent Challenge,
Google Research Football, Multi-Agent Particle Environment, and Light Aircraft
Game. Results show that HASAC consistently outperforms strong baselines,
exhibiting better sample efficiency, robustness, and sufficient exploration.
Related papers
- LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning [37.80275600302316]
distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL.
A notorious yet open challenge is if RMGs can escape the curse of multiagency.
This is the first algorithm to break the curse of multiagency for RMGs.
arXiv Detail & Related papers (2024-09-30T08:09:41Z) - Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation.
We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z) - Sample-Efficient Multi-Agent RL: An Optimization Perspective [103.35353196535544]
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.
We introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs.
We show that our algorithm provides comparable sublinear regret to the existing works.
arXiv Detail & Related papers (2023-10-10T01:39:04Z) - Heterogeneous-Agent Reinforcement Learning [16.796016254366524]
We propose Heterogeneous-Agent Reinforcement Learning (HARL) algorithms to achieve effective cooperation in the general heterogeneous-agent setting.
Central to our findings are the multi-agent advantage decomposition lemma and the sequential update scheme.
We prove that all algorithms derived from HAML inherently enjoy monotonic improvement of joint return and convergence to Nash Equilibrium.
arXiv Detail & Related papers (2023-04-19T05:08:02Z) - Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to
Cooperative MARL [10.681450002239355]
Heterogeneous-Agent Mirror Learning (HAML) provides a general template for MARL algorithmic designs.
We prove that algorithms derived from the HAML template satisfy the desired properties of the monotonic improvement of the joint reward.
We propose HAML extensions of two well-known RL algorithms, HAA2C (for A2C) and HADDPG (for DDPG)
arXiv Detail & Related papers (2022-08-02T18:16:42Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.