Rethinking Adversarial Policies: A Generalized Attack Formulation and
Provable Defense in RL
- URL: http://arxiv.org/abs/2305.17342v3
- Date: Tue, 20 Feb 2024 16:05:36 GMT
- Title: Rethinking Adversarial Policies: A Generalized Attack Formulation and
Provable Defense in RL
- Authors: Xiangyu Liu, Souradip Chakraborty, Yanchao Sun, Furong Huang
- Abstract summary: In this paper, we consider a multi-agent setting where a well-trained victim agent is exploited by an attacker controlling another agent.
Previous models do not account for the possibility that the attacker may only have partial control over $alpha$ or that the attack may produce easily detectable "abnormal" behaviors.
We introduce a generalized attack framework that has the flexibility to model what extent the adversary is able to control the agent.
We offer a provably efficient defense with convergence to the most robust victim policy through adversarial training with timescale separation.
- Score: 46.32591437241358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing works focus on direct perturbations to the victim's
state/action or the underlying transition dynamics to demonstrate the
vulnerability of reinforcement learning agents to adversarial attacks. However,
such direct manipulations may not be always realizable. In this paper, we
consider a multi-agent setting where a well-trained victim agent $\nu$ is
exploited by an attacker controlling another agent $\alpha$ with an
\textit{adversarial policy}. Previous models do not account for the possibility
that the attacker may only have partial control over $\alpha$ or that the
attack may produce easily detectable "abnormal" behaviors. Furthermore, there
is a lack of provably efficient defenses against these adversarial policies. To
address these limitations, we introduce a generalized attack framework that has
the flexibility to model to what extent the adversary is able to control the
agent, and allows the attacker to regulate the state distribution shift and
produce stealthier adversarial policies. Moreover, we offer a provably
efficient defense with polynomial convergence to the most robust victim policy
through adversarial training with timescale separation. This stands in sharp
contrast to supervised learning, where adversarial training typically provides
only \textit{empirical} defenses. Using the Robosumo competition experiments,
we show that our generalized attack formulation results in much stealthier
adversarial policies when maintaining the same winning rate as baselines.
Additionally, our adversarial training approach yields stable learning dynamics
and less exploitable victim policies.
Related papers
- CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems [13.776447110639193]
We introduce a novel method that involves injecting traitor agents into the CMARL system.
In TMDP, traitors are trained using the same MARL algorithm as the victim agents, with their reward function set as the negative of the victim agents' reward.
CuDA2 enhances the efficiency and aggressiveness of attacks on the specified victim agents' policies.
arXiv Detail & Related papers (2024-06-25T09:59:31Z) - Optimal Cost Constrained Adversarial Attacks For Multiple Agent Systems [6.69087470775851]
We formulate the problem of performing optimal adversarial agent-to-agent attacks using distributed attack agents.
We propose an optimal method integrating within-step static constrained attack-resource allocation optimization and between-step dynamic programming.
Our numerical results show that the proposed attacks can significantly reduce the rewards received by the attacked agents.
arXiv Detail & Related papers (2023-11-01T21:28:02Z) - Toward Evaluating Robustness of Reinforcement Learning with Adversarial Policy [32.1138935956272]
Reinforcement learning agents are susceptible to evasion attacks during deployment.
In this paper, we propose Intrinsically Motivated Adrial Policy (IMAP) for efficient black-box adversarial policy learning.
arXiv Detail & Related papers (2023-05-04T07:24:12Z) - Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack [53.032801921915436]
Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars.
Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks.
We show such threats exist, even when the attacker only has access to the input/output of the model.
We propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR.
arXiv Detail & Related papers (2022-11-21T09:51:28Z) - Imitating Opponent to Win: Adversarial Policy Imitation Learning in
Two-player Competitive Games [0.0]
adversarial policies adopted by an adversary agent can influence a target RL agent to perform poorly in a multi-agent environment.
In existing studies, adversarial policies are directly trained based on experiences of interacting with the victim agent.
We design a new effective adversarial policy learning algorithm that overcomes this shortcoming.
arXiv Detail & Related papers (2022-10-30T18:32:02Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Adversarial Visual Robustness by Causal Intervention [56.766342028800445]
Adversarial training is the de facto most promising defense against adversarial examples.
Yet, its passive nature inevitably prevents it from being immune to unknown attackers.
We provide a causal viewpoint of adversarial vulnerability: the cause is the confounder ubiquitously existing in learning.
arXiv Detail & Related papers (2021-06-17T14:23:54Z) - Provable Defense Against Delusive Poisoning [64.69220849669948]
We show that adversarial training can be a principled defense method against delusive poisoning.
This implies that adversarial training can be a principled defense method against delusive poisoning.
arXiv Detail & Related papers (2021-02-09T09:19:47Z) - Robust Reinforcement Learning on State Observations with Learned Optimal
Adversary [86.0846119254031]
We study the robustness of reinforcement learning with adversarially perturbed state observations.
With a fixed agent policy, we demonstrate that an optimal adversary to perturb state observations can be found.
For DRL settings, this leads to a novel empirical adversarial attack to RL agents via a learned adversary that is much stronger than previous ones.
arXiv Detail & Related papers (2021-01-21T05:38:52Z) - Policy Teaching via Environment Poisoning: Training-time Adversarial
Attacks against Reinforcement Learning [33.41280432984183]
We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy.
As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings.
arXiv Detail & Related papers (2020-03-28T23:22:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.