Execute Order 66: Targeted Data Poisoning for Reinforcement Learning
- URL: http://arxiv.org/abs/2201.00762v1
- Date: Mon, 3 Jan 2022 17:09:32 GMT
- Title: Execute Order 66: Targeted Data Poisoning for Reinforcement Learning
- Authors: Harrison Foley and Liam Fowl and Tom Goldstein and Gavin Taylor
- Abstract summary: We introduce an insidious poisoning attack for reinforcement learning which causes agent misbehavior only at specific target states.
We accomplish this by adapting a recent technique, gradient alignment, to reinforcement learning.
We test our method and demonstrate success in two Atari games of varying difficulty.
- Score: 52.593097204559314
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Data poisoning for reinforcement learning has historically focused on general
performance degradation, and targeted attacks have been successful via
perturbations that involve control of the victim's policy and rewards. We
introduce an insidious poisoning attack for reinforcement learning which causes
agent misbehavior only at specific target states - all while minimally
modifying a small fraction of training observations without assuming any
control over policy or reward. We accomplish this by adapting a recent
technique, gradient alignment, to reinforcement learning. We test our method
and demonstrate success in two Atari games of varying difficulty.
Related papers
- Behavior-Targeted Attack on Reinforcement Learning with Limited Access to Victim's Policy [9.530897053573186]
We propose a novel method for manipulating the victim agent in the black-box.
Our attack method is formulated as a bi-level optimization problem that is reduced to a matching problem.
Empirical evaluations on several reinforcement learning benchmarks show that our proposed method has superior attack performance to baselines.
arXiv Detail & Related papers (2024-06-06T08:49:51Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - PACOL: Poisoning Attacks Against Continual Learners [1.569413950416037]
In this work, we demonstrate that continual learning systems can be manipulated by malicious misinformation.
We present a new category of data poisoning attacks specific for continual learners, which we refer to as em Poisoning Attacks Against Continual learners (PACOL)
A comprehensive set of experiments shows the vulnerability of commonly used generative replay and regularization-based continual learning approaches against attack methods.
arXiv Detail & Related papers (2023-11-18T00:20:57Z) - Not All Poisons are Created Equal: Robust Training against Data
Poisoning [15.761683760167777]
Data poisoning causes misclassification of test time target examples by injecting maliciously crafted samples in the training data.
We propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks.
arXiv Detail & Related papers (2022-10-18T08:19:41Z) - Projective Ranking-based GNN Evasion Attacks [52.85890533994233]
Graph neural networks (GNNs) offer promising learning methods for graph-related tasks.
GNNs are at risk of adversarial attacks.
arXiv Detail & Related papers (2022-02-25T21:52:09Z) - Where Did You Learn That From? Surprising Effectiveness of Membership
Inference Attacks Against Temporally Correlated Data in Deep Reinforcement
Learning [114.9857000195174]
A major challenge to widespread industrial adoption of deep reinforcement learning is the potential vulnerability to privacy breaches.
We propose an adversarial attack framework tailored for testing the vulnerability of deep reinforcement learning algorithms to membership inference attacks.
arXiv Detail & Related papers (2021-09-08T23:44:57Z) - Understanding Adversarial Attacks on Observations in Deep Reinforcement
Learning [32.12283927682007]
Deep reinforcement learning models are vulnerable to adversarial attacks which can decrease the victim's total reward by manipulating the observations.
We reformulate the problem of adversarial attacks in function space and separate the previous gradient based attacks into several subspaces.
In the first stage, we train a deceptive policy by hacking the environment, and discover a set of trajectories routing to the lowest reward.
Our method provides a tighter theoretical upper bound for the attacked agent's performance than the existing approaches.
arXiv Detail & Related papers (2021-06-30T07:41:51Z) - Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z) - Provable Defense Against Delusive Poisoning [64.69220849669948]
We show that adversarial training can be a principled defense method against delusive poisoning.
This implies that adversarial training can be a principled defense method against delusive poisoning.
arXiv Detail & Related papers (2021-02-09T09:19:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.