What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?
- URL: http://arxiv.org/abs/2212.02705v5
- Date: Fri, 12 Apr 2024 17:58:52 GMT
- Title: What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?
- Authors: Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, Shaofeng Zou, Fei Miao,
- Abstract summary: Policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks.
We propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties.
- Score: 22.863241480702012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.
Related papers
- Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium [6.169364905804677]
Multi-agent reinforcement learning (MARL) has achieved notable success in cooperative tasks.
deploying MARL agents in real-world applications presents critical safety challenges.
We propose a novel theoretical framework for safe MARL with $textitstate-wise$ constraints, where safety requirements are enforced at every state the agents visit.
For practical deployment in complex high-dimensional systems, we propose $textitMulti-Agent Dual Actor-Critic$ (MADAC)
arXiv Detail & Related papers (2024-11-22T16:08:42Z) - Belief-Enriched Pessimistic Q-Learning against Adversarial State
Perturbations [5.076419064097735]
Recent work shows that a well-trained RL agent can be easily manipulated by strategically perturbing its state observations at the test stage.
Existing solutions either introduce a regularization term to improve the smoothness of the trained policy against perturbations or alternatively train the agent's policy and the attacker's policy.
We propose a new robust RL algorithm for deriving a pessimistic policy to safeguard against an agent's uncertainty about true states.
arXiv Detail & Related papers (2024-03-06T20:52:49Z) - Robust Multi-Agent Reinforcement Learning via Adversarial
Regularization: Theoretical Foundation and Stable Algorithms [79.61176746380718]
Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains.
MARL policies often lack robustness and are sensitive to small changes in their environment.
We show that we can gain robustness by controlling a policy's Lipschitz constant.
We propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies.
arXiv Detail & Related papers (2023-10-16T20:14:06Z) - Robust Multi-Agent Reinforcement Learning with State Uncertainty [17.916400875478377]
We study the problem of MARL with state uncertainty in this work.
We propose a robust multi-agent Q-learning algorithm to find such an equilibrium.
Our experiments show that the proposed RMAQ algorithm converges to the optimal value function.
arXiv Detail & Related papers (2023-07-30T12:31:42Z) - Provably Learning Nash Policies in Constrained Markov Potential Games [90.87573337770293]
Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents.
Constrained Markov Games (CMGs) are a natural formalism for safe MARL problems, though generally intractable.
arXiv Detail & Related papers (2023-06-13T13:08:31Z) - A State-Distribution Matching Approach to Non-Episodic Reinforcement
Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting.
We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations.
Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z) - On the Use and Misuse of Absorbing States in Multi-agent Reinforcement
Learning [55.95253619768565]
Current MARL algorithms assume that the number of agents within a group remains fixed throughout an experiment.
In many practical problems, an agent may terminate before their teammates.
We present a novel architecture for an existing state-of-the-art MARL algorithm which uses attention instead of a fully connected layer with absorbing states.
arXiv Detail & Related papers (2021-11-10T23:45:08Z) - Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning [25.027143431992755]
Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks.
Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply.
In this paper, we extend the theory of trust region learning to MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme.
Based on these, we develop Heterogeneous-Agent Trust Region Policy optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy optimisation (
arXiv Detail & Related papers (2021-09-23T09:44:35Z) - Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling.
We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z) - Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.