Behavior-Targeted Attack on Reinforcement Learning with Limited Access to Victim's Policy
- URL: http://arxiv.org/abs/2406.03862v1
- Date: Thu, 6 Jun 2024 08:49:51 GMT
- Title: Behavior-Targeted Attack on Reinforcement Learning with Limited Access to Victim's Policy
- Authors: Shojiro Yamabe, Kazuto Fukuchi, Ryoma Senda, Jun Sakuma,
- Abstract summary: We propose a novel method for manipulating the victim agent in the black-box.
Our attack method is formulated as a bi-level optimization problem that is reduced to a matching problem.
Empirical evaluations on several reinforcement learning benchmarks show that our proposed method has superior attack performance to baselines.
- Score: 9.530897053573186
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This study considers the attack on reinforcement learning agents where the adversary aims to control the victim's behavior as specified by the adversary by adding adversarial modifications to the victim's state observation. While some attack methods reported success in manipulating the victim agent's behavior, these methods often rely on environment-specific heuristics. In addition, all existing attack methods require white-box access to the victim's policy. In this study, we propose a novel method for manipulating the victim agent in the black-box (i.e., the adversary is allowed to observe the victim's state and action only) and no-box (i.e., the adversary is allowed to observe the victim's state only) setting without requiring environment-specific heuristics. Our attack method is formulated as a bi-level optimization problem that is reduced to a distribution matching problem and can be solved by an existing imitation learning algorithm in the black-box and no-box settings. Empirical evaluations on several reinforcement learning benchmarks show that our proposed method has superior attack performance to baselines.
Related papers
- Rethinking Adversarial Policies: A Generalized Attack Formulation and
Provable Defense in RL [46.32591437241358]
In this paper, we consider a multi-agent setting where a well-trained victim agent is exploited by an attacker controlling another agent.
Previous models do not account for the possibility that the attacker may only have partial control over $alpha$ or that the attack may produce easily detectable "abnormal" behaviors.
We introduce a generalized attack framework that has the flexibility to model what extent the adversary is able to control the agent.
We offer a provably efficient defense with convergence to the most robust victim policy through adversarial training with timescale separation.
arXiv Detail & Related papers (2023-05-27T02:54:07Z) - Toward Evaluating Robustness of Reinforcement Learning with Adversarial Policy [32.1138935956272]
Reinforcement learning agents are susceptible to evasion attacks during deployment.
In this paper, we propose Intrinsically Motivated Adrial Policy (IMAP) for efficient black-box adversarial policy learning.
arXiv Detail & Related papers (2023-05-04T07:24:12Z) - Generalizable Black-Box Adversarial Attack with Meta Learning [54.196613395045595]
In black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful perturbation based on query feedback under a query budget.
We propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability.
The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance.
arXiv Detail & Related papers (2023-01-01T07:24:12Z) - Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack [53.032801921915436]
Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars.
Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks.
We show such threats exist, even when the attacker only has access to the input/output of the model.
We propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR.
arXiv Detail & Related papers (2022-11-21T09:51:28Z) - Imitating Opponent to Win: Adversarial Policy Imitation Learning in
Two-player Competitive Games [0.0]
adversarial policies adopted by an adversary agent can influence a target RL agent to perform poorly in a multi-agent environment.
In existing studies, adversarial policies are directly trained based on experiences of interacting with the victim agent.
We design a new effective adversarial policy learning algorithm that overcomes this shortcoming.
arXiv Detail & Related papers (2022-10-30T18:32:02Z) - A Tale of HodgeRank and Spectral Method: Target Attack Against Rank
Aggregation Is the Fixed Point of Adversarial Game [153.74942025516853]
The intrinsic vulnerability of the rank aggregation methods is not well studied in the literature.
In this paper, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data.
The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments.
arXiv Detail & Related papers (2022-09-13T05:59:02Z) - Execute Order 66: Targeted Data Poisoning for Reinforcement Learning [52.593097204559314]
We introduce an insidious poisoning attack for reinforcement learning which causes agent misbehavior only at specific target states.
We accomplish this by adapting a recent technique, gradient alignment, to reinforcement learning.
We test our method and demonstrate success in two Atari games of varying difficulty.
arXiv Detail & Related papers (2022-01-03T17:09:32Z) - Adversarial Robustness of Deep Reinforcement Learning based Dynamic
Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems.
We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors.
Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z) - Understanding Adversarial Attacks on Observations in Deep Reinforcement
Learning [32.12283927682007]
Deep reinforcement learning models are vulnerable to adversarial attacks which can decrease the victim's total reward by manipulating the observations.
We reformulate the problem of adversarial attacks in function space and separate the previous gradient based attacks into several subspaces.
In the first stage, we train a deceptive policy by hacking the environment, and discover a set of trajectories routing to the lowest reward.
Our method provides a tighter theoretical upper bound for the attacked agent's performance than the existing approaches.
arXiv Detail & Related papers (2021-06-30T07:41:51Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z) - Policy Teaching via Environment Poisoning: Training-time Adversarial
Attacks against Reinforcement Learning [33.41280432984183]
We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy.
As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings.
arXiv Detail & Related papers (2020-03-28T23:22:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.