Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2212.11746v1
- Date: Thu, 22 Dec 2022 14:36:27 GMT
- Title: Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement
Learning
- Authors: Ronghui Mu, Wenjie Ruan, Leandro Soriano Marcolino, Gaojie Jin, Qiang
Ni
- Abstract summary: We propose a novel certification method for c-MARLs to determine actions with guaranteed certified bounds.
We empirically show that our certification bounds are much tighter than state-of-the-art RL certification solutions.
Our method produces meaningful guaranteed robustness for all models and environments.
- Score: 17.957644784944755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cooperative multi-agent reinforcement learning (c-MARL) is widely applied in
safety-critical scenarios, thus the analysis of robustness for c-MARL models is
profoundly important. However, robustness certification for c-MARLs has not yet
been explored in the community. In this paper, we propose a novel certification
method, which is the first work to leverage a scalable approach for c-MARLs to
determine actions with guaranteed certified bounds. c-MARL certification poses
two key challenges compared with single-agent systems: (i) the accumulated
uncertainty as the number of agents increases; (ii) the potential lack of
impact when changing the action of a single agent into a global team reward.
These challenges prevent us from directly using existing algorithms. Hence, we
employ the false discovery rate (FDR) controlling procedure considering the
importance of each agent to certify per-state robustness and propose a
tree-search-based algorithm to find a lower bound of the global reward under
the minimal certified perturbation. As our method is general, it can also be
applied in single-agent environments. We empirically show that our
certification bounds are much tighter than state-of-the-art RL certification
solutions. We also run experiments on two popular c-MARL algorithms: QMIX and
VDN, in two different environments, with two and four agents. The experimental
results show that our method produces meaningful guaranteed robustness for all
models and environments. Our tool CertifyCMARL is available at
https://github.com/TrustAI/CertifyCMA
Related papers
- Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium [6.169364905804677]
Multi-agent reinforcement learning (MARL) has achieved notable success in cooperative tasks.
deploying MARL agents in real-world applications presents critical safety challenges.
We propose a novel theoretical framework for safe MARL with $textitstate-wise$ constraints, where safety requirements are enforced at every state the agents visit.
For practical deployment in complex high-dimensional systems, we propose $textitMulti-Agent Dual Actor-Critic$ (MADAC)
arXiv Detail & Related papers (2024-11-22T16:08:42Z) - Deep Multi-Agent Reinforcement Learning for Decentralized Active
Hypothesis Testing [11.639503711252663]
We tackle the multi-agent active hypothesis testing (AHT) problem by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning.
We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance.
arXiv Detail & Related papers (2023-09-14T01:18:04Z) - Maximum Entropy Heterogeneous-Agent Reinforcement Learning [47.652866966384586]
Multi-agent reinforcement learning (MARL) has been shown effective for cooperative games in recent years.
We propose a unified framework for learning emphstochastic policies to resolve these issues.
Based on the MaxEnt framework, we propose Heterogeneous-Agent Soft Actor-Critic (HASAC) algorithm.
arXiv Detail & Related papers (2023-06-19T06:22:02Z) - Heterogeneous-Agent Reinforcement Learning [16.796016254366524]
We propose Heterogeneous-Agent Reinforcement Learning (HARL) algorithms to achieve effective cooperation in the general heterogeneous-agent setting.
Central to our findings are the multi-agent advantage decomposition lemma and the sequential update scheme.
We prove that all algorithms derived from HAML inherently enjoy monotonic improvement of joint return and convergence to Nash Equilibrium.
arXiv Detail & Related papers (2023-04-19T05:08:02Z) - Towards Comprehensive Testing on the Robustness of Cooperative
Multi-agent Reinforcement Learning [10.132303690998523]
It is crucial to test the robustness of c-MARL algorithm before it was deployed in reality.
Existing adversarial attacks for MARL could be used for testing, but is limited to one robustness aspect.
We propose MARLSafe, the first robustness testing framework for c-MARL algorithms.
arXiv Detail & Related papers (2022-04-17T05:15:51Z) - COPA: Certifying Robust Policies for Offline Reinforcement Learning
against Poisoning Attacks [49.15885037760725]
We focus on certifying the robustness of offline reinforcement learning (RL) in the presence of poisoning attacks.
We propose the first certification framework, COPA, to certify the number of poisoning trajectories that can be tolerated.
We prove that some of the proposed certification methods are theoretically tight and some are NP-Complete problems.
arXiv Detail & Related papers (2022-03-16T05:02:47Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning [34.856522993714535]
We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents.
Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.
arXiv Detail & Related papers (2021-06-01T07:38:34Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.