Related papers: Execute Order 66: Targeted Data Poisoning for Reinforcement Learning

Execute Order 66: Targeted Data Poisoning for Reinforcement Learning

URL: http://arxiv.org/abs/2201.00762v1
Date: Mon, 3 Jan 2022 17:09:32 GMT
Title: Execute Order 66: Targeted Data Poisoning for Reinforcement Learning
Authors: Harrison Foley and Liam Fowl and Tom Goldstein and Gavin Taylor
Abstract summary: We introduce an insidious poisoning attack for reinforcement learning which causes agent misbehavior only at specific target states. We accomplish this by adapting a recent technique, gradient alignment, to reinforcement learning. We test our method and demonstrate success in two Atari games of varying difficulty.
Score: 52.593097204559314
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Data poisoning for reinforcement learning has historically focused on general performance degradation, and targeted attacks have been successful via perturbations that involve control of the victim's policy and rewards. We introduce an insidious poisoning attack for reinforcement learning which causes agent misbehavior only at specific target states - all while minimally modifying a small fraction of training observations without assuming any control over policy or reward. We accomplish this by adapting a recent technique, gradient alignment, to reinforcement learning. We test our method and demonstrate success in two Atari games of varying difficulty.

Related papers

Behavior-Targeted Attack on Reinforcement Learning with Limited Access to Victim's Policy [9.530897053573186]
We propose a novel method for manipulating the victim agent in the black-box. Our attack method is formulated as a bi-level optimization problem that is reduced to a matching problem. Empirical evaluations on several reinforcement learning benchmarks show that our proposed method has superior attack performance to baselines.
arXiv Detail & Related papers (2024-06-06T08:49:51Z)
SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources. Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker. Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z)
PACOL: Poisoning Attacks Against Continual Learners [1.569413950416037]
In this work, we demonstrate that continual learning systems can be manipulated by malicious misinformation. We present a new category of data poisoning attacks specific for continual learners, which we refer to as em Poisoning Attacks Against Continual learners (PACOL) A comprehensive set of experiments shows the vulnerability of commonly used generative replay and regularization-based continual learning approaches against attack methods.
arXiv Detail & Related papers (2023-11-18T00:20:57Z)
Not All Poisons are Created Equal: Robust Training against Data Poisoning [15.761683760167777]
Data poisoning causes misclassification of test time target examples by injecting maliciously crafted samples in the training data. We propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks.
arXiv Detail & Related papers (2022-10-18T08:19:41Z)
Projective Ranking-based GNN Evasion Attacks [52.85890533994233]
Graph neural networks (GNNs) offer promising learning methods for graph-related tasks. GNNs are at risk of adversarial attacks.
arXiv Detail & Related papers (2022-02-25T21:52:09Z)
Where Did You Learn That From? Surprising Effectiveness of Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning [114.9857000195174]
A major challenge to widespread industrial adoption of deep reinforcement learning is the potential vulnerability to privacy breaches. We propose an adversarial attack framework tailored for testing the vulnerability of deep reinforcement learning algorithms to membership inference attacks.
arXiv Detail & Related papers (2021-09-08T23:44:57Z)
Understanding Adversarial Attacks on Observations in Deep Reinforcement Learning [32.12283927682007]
Deep reinforcement learning models are vulnerable to adversarial attacks which can decrease the victim's total reward by manipulating the observations. We reformulate the problem of adversarial attacks in function space and separate the previous gradient based attacks into several subspaces. In the first stage, we train a deceptive policy by hacking the environment, and discover a set of trajectories routing to the lowest reward. Our method provides a tighter theoretical upper bound for the attacked agent's performance than the existing approaches.
arXiv Detail & Related papers (2021-06-30T07:41:51Z)
Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms. We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z)
Provable Defense Against Delusive Poisoning [64.69220849669948]
We show that adversarial training can be a principled defense method against delusive poisoning. This implies that adversarial training can be a principled defense method against delusive poisoning.
arXiv Detail & Related papers (2021-02-09T09:19:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.