Robust Multi-Agent Control via Maximum Entropy Heterogeneous-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2306.10715v5
- Date: Wed, 11 Dec 2024 16:59:50 GMT
- Title: Robust Multi-Agent Control via Maximum Entropy Heterogeneous-Agent Reinforcement Learning
- Authors: Simin Li, Yifan Zhong, Jiarong Liu, Jianing Guo, Siyuan Qi, Ruixiao Xu, Xin Yu, Siyi Hu, Haobo Fu, Qiang Fu, Xiaojun Chang, Yujing Hu, Bo An, Xianglong Liu, Yaodong Yang,
- Abstract summary: We propose a unified framework for learning emphstochastic policies to resolve issues in multi-agent reinforcement learning.
Based on the MaxEnt framework, we propose emphHeterogeneous-Agent Soft Actor-Critic (HASAC) algorithm.
We evaluate HASAC on seven benchmarks: Bi-DexHands, Multi-Agent MuJoCo, Pursuit-Evade, StarCraft Multi-Agent Challenge, Google Research Football, Multi-Agent Particle Environment, Light Aircraft Game.
- Score: 65.60470000696944
- License:
- Abstract: In multi-agent reinforcement learning, optimal control with robustness guarantees are critical for its deployment in real world. However, existing methods face challenges related to sample complexity, training instability, potential suboptimal Nash Equilibrium convergence and non-robustness to multiple perturbations. In this paper, we propose a unified framework for learning \emph{stochastic} policies to resolve these issues. We embed cooperative MARL problems into probabilistic graphical models, from which we derive the maximum entropy (MaxEnt) objective optimal for MARL. Based on the MaxEnt framework, we propose \emph{Heterogeneous-Agent Soft Actor-Critic} (HASAC) algorithm. Theoretically, we prove the monotonic improvement and convergence to \emph{quantal response equilibrium} (QRE) properties of HASAC. Furthermore, HASAC is provably robust against a wide range of real-world uncertainties, including perturbations in rewards, environment dynamics, states, and actions. Finally, we generalize a unified template for MaxEnt algorithmic design named \emph{Maximum Entropy Heterogeneous-Agent Mirror Learning} (MEHAML), which provides any induced method with the same guarantees as HASAC. We evaluate HASAC on seven benchmarks: Bi-DexHands, Multi-Agent MuJoCo, Pursuit-Evade, StarCraft Multi-Agent Challenge, Google Research Football, Multi-Agent Particle Environment, Light Aircraft Game. Results show that HASAC consistently outperforms strong baselines in 34 out of 38 tasks, exhibiting improved training stability, better sample efficiency and sufficient exploration. The robustness of HASAC was further validated when encountering uncertainties in rewards, dynamics, states, and actions of 14 magnitudes, and real-world deployment in a multi-robot arena against these four types of uncertainties. See our page at \url{https://sites.google.com/view/meharl}.
Related papers
- Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees [91.88803125231189]
Multi-step Preference Optimization (MPO) is built upon the natural actor-critic frameworkciteprakhlin2013online,joulani17a.
We show that OMPO requires $mathcalO(epsilon-1)$ policy updates to converge to an $epsilon$-approximate Nash equilibrium.
We also validate the effectiveness of our method on multi-turn conversations dataset and math reasoning dataset.
arXiv Detail & Related papers (2025-02-18T09:33:48Z) - Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning [37.80275600302316]
distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL.
Two notorious and open challenges are the formulation of the uncertainty set and whether the corresponding RMGs can overcome the curse of multiagency.
In this work, we propose a natural class of RMGs inspired by behavioral economics, where each agent's uncertainty set is shaped by both the environment and the integrated behavior of other agents.
arXiv Detail & Related papers (2024-09-30T08:09:41Z) - Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty [40.55653383218379]
This work focuses on learning in distributionally robust Markov games (RMGs)
We propose a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees for learning robust variants of various notions of game-theoretic equilibria.
arXiv Detail & Related papers (2024-04-29T17:51:47Z) - Robust Multi-Agent Reinforcement Learning via Adversarial
Regularization: Theoretical Foundation and Stable Algorithms [79.61176746380718]
Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains.
MARL policies often lack robustness and are sensitive to small changes in their environment.
We show that we can gain robustness by controlling a policy's Lipschitz constant.
We propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies.
arXiv Detail & Related papers (2023-10-16T20:14:06Z) - Robust Multi-Agent Reinforcement Learning with State Uncertainty [17.916400875478377]
We study the problem of MARL with state uncertainty in this work.
We propose a robust multi-agent Q-learning algorithm to find such an equilibrium.
Our experiments show that the proposed RMAQ algorithm converges to the optimal value function.
arXiv Detail & Related papers (2023-07-30T12:31:42Z) - Heterogeneous-Agent Reinforcement Learning [16.796016254366524]
We propose Heterogeneous-Agent Reinforcement Learning (HARL) algorithms to achieve effective cooperation in the general heterogeneous-agent setting.
Central to our findings are the multi-agent advantage decomposition lemma and the sequential update scheme.
We prove that all algorithms derived from HAML inherently enjoy monotonic improvement of joint return and convergence to Nash Equilibrium.
arXiv Detail & Related papers (2023-04-19T05:08:02Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in
Multi-Agent Simulations [110.72725220033983]
Epsilon-Robust Multi-Agent Simulation (ERMAS) is a framework for learning AI policies that are robust to such multiagent sim-to-real gaps.
ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
In particular, ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
arXiv Detail & Related papers (2021-06-10T04:32:20Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.