Policy Teaching in Reinforcement Learning via Environment Poisoning
Attacks
- URL: http://arxiv.org/abs/2011.10824v1
- Date: Sat, 21 Nov 2020 16:54:45 GMT
- Title: Policy Teaching in Reinforcement Learning via Environment Poisoning
Attacks
- Authors: Amin Rakhsha, Goran Radanovic, Rati Devidze, Xiaojin Zhu, Adish Singla
- Abstract summary: We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker.
As a victim, we consider RL agents whose objective is to find a policy that maximizes reward in infinite-horizon problem settings.
- Score: 33.41280432984183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a security threat to reinforcement learning where an attacker
poisons the learning environment to force the agent into executing a target
policy chosen by the attacker. As a victim, we consider RL agents whose
objective is to find a policy that maximizes reward in infinite-horizon problem
settings. The attacker can manipulate the rewards and the transition dynamics
in the learning environment at training-time, and is interested in doing so in
a stealthy manner. We propose an optimization framework for finding an optimal
stealthy attack for different measures of attack cost. We provide lower/upper
bounds on the attack cost, and instantiate our attacks in two settings: (i) an
offline setting where the agent is doing planning in the poisoned environment,
and (ii) an online setting where the agent is learning a policy with poisoned
feedback. Our results show that the attacker can easily succeed in teaching any
target policy to the victim under mild conditions and highlight a significant
security threat to reinforcement learning agents in practice.
Related papers
- Purple-teaming LLMs with Adversarial Defender Training [57.535241000787416]
We present Purple-teaming LLMs with Adversarial Defender training (PAD)
PAD is a pipeline designed to safeguard LLMs by novelly incorporating the red-teaming (attack) and blue-teaming (safety training) techniques.
PAD significantly outperforms existing baselines in both finding effective attacks and establishing a robust safe guardrail.
arXiv Detail & Related papers (2024-07-01T23:25:30Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - Efficient Adversarial Attacks on Online Multi-agent Reinforcement
Learning [45.408568528354216]
We investigate the impact of adversarial attacks on multi-agent reinforcement learning (MARL)
In the considered setup, there is an attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them.
We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents' algorithms.
arXiv Detail & Related papers (2023-07-15T00:38:55Z) - Rethinking Adversarial Policies: A Generalized Attack Formulation and
Provable Defense in RL [46.32591437241358]
In this paper, we consider a multi-agent setting where a well-trained victim agent is exploited by an attacker controlling another agent.
Previous models do not account for the possibility that the attacker may only have partial control over $alpha$ or that the attack may produce easily detectable "abnormal" behaviors.
We introduce a generalized attack framework that has the flexibility to model what extent the adversary is able to control the agent.
We offer a provably efficient defense with convergence to the most robust victim policy through adversarial training with timescale separation.
arXiv Detail & Related papers (2023-05-27T02:54:07Z) - Black-Box Targeted Reward Poisoning Attack Against Online Deep
Reinforcement Learning [2.3526458707956643]
We propose the first black-box targeted attack against online deep reinforcement learning through reward poisoning during training time.
Our attack is applicable to general environments with unknown dynamics learned by unknown algorithms.
arXiv Detail & Related papers (2023-05-18T03:37:29Z) - Toward Evaluating Robustness of Reinforcement Learning with Adversarial Policy [32.1138935956272]
Reinforcement learning agents are susceptible to evasion attacks during deployment.
In this paper, we propose Intrinsically Motivated Adrial Policy (IMAP) for efficient black-box adversarial policy learning.
arXiv Detail & Related papers (2023-05-04T07:24:12Z) - Implicit Poisoning Attacks in Two-Agent Reinforcement Learning:
Adversarial Policies for Training-Time Attacks [21.97069271045167]
In targeted poisoning attacks, an attacker manipulates an agent-environment interaction to force the agent into adopting a policy of interest, called target policy.
We study targeted poisoning attacks in a two-agent setting where an attacker implicitly poisons the effective environment of one of the agents by modifying the policy of its peer.
We develop an optimization framework for designing optimal attacks, where the cost of the attack measures how much the solution deviates from the assumed default policy of the peer agent.
arXiv Detail & Related papers (2023-02-27T14:52:15Z) - Projective Ranking-based GNN Evasion Attacks [52.85890533994233]
Graph neural networks (GNNs) offer promising learning methods for graph-related tasks.
GNNs are at risk of adversarial attacks.
arXiv Detail & Related papers (2022-02-25T21:52:09Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z) - Policy Teaching via Environment Poisoning: Training-time Adversarial
Attacks against Reinforcement Learning [33.41280432984183]
We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy.
As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings.
arXiv Detail & Related papers (2020-03-28T23:22:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.