Robust Multi-Agent Control via Maximum Entropy Heterogeneous-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2306.10715v5
- Date: Wed, 11 Dec 2024 16:59:50 GMT
- Title: Robust Multi-Agent Control via Maximum Entropy Heterogeneous-Agent Reinforcement Learning
- Authors: Simin Li, Yifan Zhong, Jiarong Liu, Jianing Guo, Siyuan Qi, Ruixiao Xu, Xin Yu, Siyi Hu, Haobo Fu, Qiang Fu, Xiaojun Chang, Yujing Hu, Bo An, Xianglong Liu, Yaodong Yang,
- Abstract summary: We propose a unified framework for learning emphstochastic policies to resolve issues in multi-agent reinforcement learning.<n>Based on the MaxEnt framework, we propose emphHeterogeneous-Agent Soft Actor-Critic (HASAC) algorithm.<n>We evaluate HASAC on seven benchmarks: Bi-DexHands, Multi-Agent MuJoCo, Pursuit-Evade, StarCraft Multi-Agent Challenge, Google Research Football, Multi-Agent Particle Environment, Light Aircraft Game.
- Score: 65.60470000696944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-agent reinforcement learning, optimal control with robustness guarantees are critical for its deployment in real world. However, existing methods face challenges related to sample complexity, training instability, potential suboptimal Nash Equilibrium convergence and non-robustness to multiple perturbations. In this paper, we propose a unified framework for learning \emph{stochastic} policies to resolve these issues. We embed cooperative MARL problems into probabilistic graphical models, from which we derive the maximum entropy (MaxEnt) objective optimal for MARL. Based on the MaxEnt framework, we propose \emph{Heterogeneous-Agent Soft Actor-Critic} (HASAC) algorithm. Theoretically, we prove the monotonic improvement and convergence to \emph{quantal response equilibrium} (QRE) properties of HASAC. Furthermore, HASAC is provably robust against a wide range of real-world uncertainties, including perturbations in rewards, environment dynamics, states, and actions. Finally, we generalize a unified template for MaxEnt algorithmic design named \emph{Maximum Entropy Heterogeneous-Agent Mirror Learning} (MEHAML), which provides any induced method with the same guarantees as HASAC. We evaluate HASAC on seven benchmarks: Bi-DexHands, Multi-Agent MuJoCo, Pursuit-Evade, StarCraft Multi-Agent Challenge, Google Research Football, Multi-Agent Particle Environment, Light Aircraft Game. Results show that HASAC consistently outperforms strong baselines in 34 out of 38 tasks, exhibiting improved training stability, better sample efficiency and sufficient exploration. The robustness of HASAC was further validated when encountering uncertainties in rewards, dynamics, states, and actions of 14 magnitudes, and real-world deployment in a multi-robot arena against these four types of uncertainties. See our page at \url{https://sites.google.com/view/meharl}.
Related papers
- Multi-Agent Inverse Q-Learning from Demonstrations [3.4136908117644698]
Multi-Agent Marginal Q-Learning from Demonstrations (MAMQL) is a novel sample-efficient framework for multi-agent IRL.
We show MAMQL significantly outperforms previous multi-agent methods in average reward, sample efficiency, and reward recovery by often more than 2-5x.
arXiv Detail & Related papers (2025-03-06T18:22:29Z) - Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration [81.45763823762682]
This work aims to bridge the gap by investigating the problem of data synthesis through multi-agent sampling.
We introduce Tree Search-based Orchestrated Agents(TOA), where the workflow evolves iteratively during the sequential sampling process.
Our experiments on alignment, machine translation, and mathematical reasoning demonstrate that multi-agent sampling significantly outperforms single-agent sampling as inference compute scales.
arXiv Detail & Related papers (2024-12-22T15:16:44Z) - LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning [37.80275600302316]
distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL.
A notorious yet open challenge is if RMGs can escape the curse of multiagency.
This is the first algorithm to break the curse of multiagency for RMGs.
arXiv Detail & Related papers (2024-09-30T08:09:41Z) - Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS)
For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium.
We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z) - Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation.
We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z) - Sample-Efficient Multi-Agent RL: An Optimization Perspective [103.35353196535544]
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.
We introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs.
We show that our algorithm provides comparable sublinear regret to the existing works.
arXiv Detail & Related papers (2023-10-10T01:39:04Z) - Heterogeneous-Agent Reinforcement Learning [16.796016254366524]
We propose Heterogeneous-Agent Reinforcement Learning (HARL) algorithms to achieve effective cooperation in the general heterogeneous-agent setting.
Central to our findings are the multi-agent advantage decomposition lemma and the sequential update scheme.
We prove that all algorithms derived from HAML inherently enjoy monotonic improvement of joint return and convergence to Nash Equilibrium.
arXiv Detail & Related papers (2023-04-19T05:08:02Z) - Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to
Cooperative MARL [10.681450002239355]
Heterogeneous-Agent Mirror Learning (HAML) provides a general template for MARL algorithmic designs.
We prove that algorithms derived from the HAML template satisfy the desired properties of the monotonic improvement of the joint reward.
We propose HAML extensions of two well-known RL algorithms, HAA2C (for A2C) and HADDPG (for DDPG)
arXiv Detail & Related papers (2022-08-02T18:16:42Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.