Related papers: Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

URL: http://arxiv.org/abs/2310.10810v1
Date: Mon, 16 Oct 2023 20:14:06 GMT
Title: Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms
Authors: Alexander Bukharin, Yan Li, Yue Yu, Qingru Zhang, Zhehui Chen, Simiao Zuo, Chao Zhang, Songan Zhang, and Tuo Zhao
Abstract summary: Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains. MARL policies often lack robustness and are sensitive to small changes in their environment. We show that we can gain robustness by controlling a policy's Lipschitz constant. We propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies.
Score: 79.61176746380718
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains. Despite this promise, MARL policies often lack robustness and are therefore sensitive to small changes in their environment. This presents a serious concern for the real world deployment of MARL algorithms, where the testing environment may slightly differ from the training environment. In this work we show that we can gain robustness by controlling a policy's Lipschitz constant, and under mild conditions, establish the existence of a Lipschitz and close-to-optimal policy. Based on these insights, we propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies with respect to the state observations and actions by adversarial regularization. The ERNIE framework provides robustness against noisy observations, changing transition dynamics, and malicious actions of agents. However, ERNIE's adversarial regularization may introduce some training instability. To reduce this instability, we reformulate adversarial regularization as a Stackelberg game. We demonstrate the effectiveness of the proposed framework with extensive experiments in traffic light control and particle environments. In addition, we extend ERNIE to mean-field MARL with a formulation based on distributionally robust optimization that outperforms its non-robust counterpart and is of independent interest. Our code is available at https://github.com/abukharin3/ERNIE.

Related papers

Belief-Enriched Pessimistic Q-Learning against Adversarial State Perturbations [5.076419064097735]
Recent work shows that a well-trained RL agent can be easily manipulated by strategically perturbing its state observations at the test stage. Existing solutions either introduce a regularization term to improve the smoothness of the trained policy against perturbations or alternatively train the agent's policy and the attacker's policy. We propose a new robust RL algorithm for deriving a pessimistic policy to safeguard against an agent's uncertainty about true states.
arXiv Detail & Related papers (2024-03-06T20:52:49Z)
AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation [65.4532392602682]
One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy. This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation. We introduce AlberDICE, an offline MARL algorithm that performs centralized training of individual agents based on stationary distribution optimization.
arXiv Detail & Related papers (2023-11-03T18:56:48Z)
Robust Multi-Agent Reinforcement Learning with State Uncertainty [17.916400875478377]
We study the problem of MARL with state uncertainty in this work. We propose a robust multi-agent Q-learning algorithm to find such an equilibrium. Our experiments show that the proposed RMAQ algorithm converges to the optimal value function.
arXiv Detail & Related papers (2023-07-30T12:31:42Z)
What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning? [22.863241480702012]
Policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. We propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties.
arXiv Detail & Related papers (2022-12-06T01:57:33Z)
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment. We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z)
Multi-Agent Constrained Policy Optimisation [17.772811770726296]
We formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods. Our solutions -- Multi-Agent Constrained Policy optimisation (MACPO) and MAPPO-Lagrangian -- leverage the theories from both constrained policy optimisation and multi-agent trust region learning. We develop the benchmark suite of Safe Multi-Agent MuJoCo that involves a variety of MARL baselines.
arXiv Detail & Related papers (2021-10-06T14:17:09Z)
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning [25.027143431992755]
Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks. Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply. In this paper, we extend the theory of trust region learning to MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme. Based on these, we develop Heterogeneous-Agent Trust Region Policy optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy optimisation (
arXiv Detail & Related papers (2021-09-23T09:44:35Z)
Dealing with Non-Stationarity in Multi-Agent Reinforcement Learning via Trust Region Decomposition [52.06086375833474]
Non-stationarity is one thorny issue in multi-agent reinforcement learning. We introduce a $delta$-stationarity measurement to explicitly model the stationarity of a policy sequence. We propose a trust region decomposition network based on message passing to estimate the joint policy divergence.
arXiv Detail & Related papers (2021-02-21T14:46:50Z)
Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states. We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization. Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)
Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises. Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions. We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.