Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
- URL: http://arxiv.org/abs/2406.03862v2
- Date: Sat, 17 May 2025 08:54:06 GMT
- Title: Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
- Authors: Shojiro Yamabe, Kazuto Fukuchi, Jun Sakuma,
- Abstract summary: This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures.<n>To the best of our knowledge, this is the first defense strategy specifically designed for behavior-targeted attacks.
- Score: 10.411820336052784
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures. Behavior-targeted attacks aim to manipulate the victim's behavior as desired by the adversary through adversarial interventions in state observations. Existing behavior-targeted attacks have some limitations, such as requiring white-box access to the victim's policy. To address this, we propose a novel attack method using imitation learning from adversarial demonstrations, which works under limited access to the victim's policy and is environment-agnostic. In addition, our theoretical analysis proves that the policy's sensitivity to state changes impacts defense performance, particularly in the early stages of the trajectory. Based on this insight, we propose time-discounted regularization, which enhances robustness against attacks while maintaining task performance. To the best of our knowledge, this is the first defense strategy specifically designed for behavior-targeted attacks.
Related papers
- RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors [15.593859086891745]
RAT trains an intention policy that is explicitly aligned with human preferences.<n>RAT dynamically adjusts the state occupancy measure within the replay buffer, allowing for more controlled and effective behavior manipulation.
arXiv Detail & Related papers (2024-12-14T06:56:11Z) - On the Difficulty of Defending Contrastive Learning against Backdoor
Attacks [58.824074124014224]
We show how contrastive backdoor attacks operate through distinctive mechanisms.
Our findings highlight the need for defenses tailored to the specificities of contrastive backdoor attacks.
arXiv Detail & Related papers (2023-12-14T15:54:52Z) - Rethinking Adversarial Policies: A Generalized Attack Formulation and
Provable Defense in RL [46.32591437241358]
In this paper, we consider a multi-agent setting where a well-trained victim agent is exploited by an attacker controlling another agent.
Previous models do not account for the possibility that the attacker may only have partial control over $alpha$ or that the attack may produce easily detectable "abnormal" behaviors.
We introduce a generalized attack framework that has the flexibility to model what extent the adversary is able to control the agent.
We offer a provably efficient defense with convergence to the most robust victim policy through adversarial training with timescale separation.
arXiv Detail & Related papers (2023-05-27T02:54:07Z) - Toward Evaluating Robustness of Reinforcement Learning with Adversarial Policy [32.1138935956272]
Reinforcement learning agents are susceptible to evasion attacks during deployment.
In this paper, we propose Intrinsically Motivated Adrial Policy (IMAP) for efficient black-box adversarial policy learning.
arXiv Detail & Related papers (2023-05-04T07:24:12Z) - Generalizable Black-Box Adversarial Attack with Meta Learning [54.196613395045595]
In black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful perturbation based on query feedback under a query budget.
We propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability.
The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance.
arXiv Detail & Related papers (2023-01-01T07:24:12Z) - Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via
Model Checking [3.5884936187733394]
This paper presents a metric that measures the exact impact of adversarial attacks against temporal logic properties.
We also introduce a model checking method that allows us to verify the robustness of RL policies against adversarial attacks.
arXiv Detail & Related papers (2022-12-10T17:13:10Z) - Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack [53.032801921915436]
Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars.
Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks.
We show such threats exist, even when the attacker only has access to the input/output of the model.
We propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR.
arXiv Detail & Related papers (2022-11-21T09:51:28Z) - Imitating Opponent to Win: Adversarial Policy Imitation Learning in
Two-player Competitive Games [0.0]
adversarial policies adopted by an adversary agent can influence a target RL agent to perform poorly in a multi-agent environment.
In existing studies, adversarial policies are directly trained based on experiences of interacting with the victim agent.
We design a new effective adversarial policy learning algorithm that overcomes this shortcoming.
arXiv Detail & Related papers (2022-10-30T18:32:02Z) - A Tale of HodgeRank and Spectral Method: Target Attack Against Rank
Aggregation Is the Fixed Point of Adversarial Game [153.74942025516853]
The intrinsic vulnerability of the rank aggregation methods is not well studied in the literature.
In this paper, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data.
The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments.
arXiv Detail & Related papers (2022-09-13T05:59:02Z) - Defending Observation Attacks in Deep Reinforcement Learning via
Detection and Denoising [3.2023814100005907]
Attacks manifesting as perturbations in the observation space managed by the external environment have been shown to downgrade policy performance.
To defend against these attacks, we propose a novel defense strategy using a detect-and-denoise schema.
Our solution does not require sampling data in an environment under attack, thereby greatly reducing risk during training.
arXiv Detail & Related papers (2022-06-14T22:28:30Z) - Execute Order 66: Targeted Data Poisoning for Reinforcement Learning [52.593097204559314]
We introduce an insidious poisoning attack for reinforcement learning which causes agent misbehavior only at specific target states.
We accomplish this by adapting a recent technique, gradient alignment, to reinforcement learning.
We test our method and demonstrate success in two Atari games of varying difficulty.
arXiv Detail & Related papers (2022-01-03T17:09:32Z) - Adversarial Robustness of Deep Reinforcement Learning based Dynamic
Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems.
We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors.
Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z) - Targeted Attack on Deep RL-based Autonomous Driving with Learned Visual
Patterns [18.694795507945603]
Recent studies demonstrated the vulnerability of control policies learned through deep reinforcement learning against adversarial attacks.
This paper investigates the feasibility of targeted attacks through visually learned patterns placed on physical object in the environment.
arXiv Detail & Related papers (2021-09-16T04:59:06Z) - Understanding Adversarial Attacks on Observations in Deep Reinforcement
Learning [32.12283927682007]
Deep reinforcement learning models are vulnerable to adversarial attacks which can decrease the victim's total reward by manipulating the observations.
We reformulate the problem of adversarial attacks in function space and separate the previous gradient based attacks into several subspaces.
In the first stage, we train a deceptive policy by hacking the environment, and discover a set of trajectories routing to the lowest reward.
Our method provides a tighter theoretical upper bound for the attacked agent's performance than the existing approaches.
arXiv Detail & Related papers (2021-06-30T07:41:51Z) - Guided Adversarial Attack for Evaluating and Enhancing Adversarial
Defenses [59.58128343334556]
We introduce a relaxation term to the standard loss, that finds more suitable gradient-directions, increases attack efficacy and leads to more efficient adversarial training.
We propose Guided Adversarial Margin Attack (GAMA), which utilizes function mapping of the clean image to guide the generation of adversaries.
We also propose Guided Adversarial Training (GAT), which achieves state-of-the-art performance amongst single-step defenses.
arXiv Detail & Related papers (2020-11-30T16:39:39Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z) - Robust Tracking against Adversarial Attacks [69.59717023941126]
We first attempt to generate adversarial examples on top of video sequences to improve the tracking robustness against adversarial attacks.
We apply the proposed adversarial attack and defense approaches to state-of-the-art deep tracking algorithms.
arXiv Detail & Related papers (2020-07-20T08:05:55Z) - Policy Teaching via Environment Poisoning: Training-time Adversarial
Attacks against Reinforcement Learning [33.41280432984183]
We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy.
As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings.
arXiv Detail & Related papers (2020-03-28T23:22:28Z) - On Adaptive Attacks to Adversarial Example Defenses [123.32678153377915]
This paper lays out the methodology and the approach necessary to perform an adaptive attack against defenses to adversarial examples.
We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples.
arXiv Detail & Related papers (2020-02-19T18:50:29Z) - Deflecting Adversarial Attacks [94.85315681223702]
We present a new approach towards ending this cycle where we "deflect" adversarial attacks by causing the attacker to produce an input that resembles the attack's target class.
We first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance.
arXiv Detail & Related papers (2020-02-18T06:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.