Policy Teaching via Environment Poisoning: Training-time Adversarial
Attacks against Reinforcement Learning
- URL: http://arxiv.org/abs/2003.12909v2
- Date: Wed, 19 Aug 2020 00:07:20 GMT
- Title: Policy Teaching via Environment Poisoning: Training-time Adversarial
Attacks against Reinforcement Learning
- Authors: Amin Rakhsha, Goran Radanovic, Rati Devidze, Xiaojin Zhu, Adish Singla
- Abstract summary: We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy.
As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings.
- Score: 33.41280432984183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a security threat to reinforcement learning where an attacker
poisons the learning environment to force the agent into executing a target
policy chosen by the attacker. As a victim, we consider RL agents whose
objective is to find a policy that maximizes average reward in undiscounted
infinite-horizon problem settings. The attacker can manipulate the rewards or
the transition dynamics in the learning environment at training-time and is
interested in doing so in a stealthy manner. We propose an optimization
framework for finding an \emph{optimal stealthy attack} for different measures
of attack cost. We provide sufficient technical conditions under which the
attack is feasible and provide lower/upper bounds on the attack cost. We
instantiate our attacks in two settings: (i) an \emph{offline} setting where
the agent is doing planning in the poisoned environment, and (ii) an
\emph{online} setting where the agent is learning a policy using a
regret-minimization framework with poisoned feedback. Our results show that the
attacker can easily succeed in teaching any target policy to the victim under
mild conditions and highlight a significant security threat to reinforcement
learning agents in practice.
Related papers
- Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - Optimal Attack and Defense for Reinforcement Learning [11.36770403327493]
In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment.
We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward.
We argue that the optimal defense policy for the victim can be computed as the solution to a Stackelberg game.
arXiv Detail & Related papers (2023-11-30T21:21:47Z) - Efficient Adversarial Attacks on Online Multi-agent Reinforcement
Learning [45.408568528354216]
We investigate the impact of adversarial attacks on multi-agent reinforcement learning (MARL)
In the considered setup, there is an attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them.
We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents' algorithms.
arXiv Detail & Related papers (2023-07-15T00:38:55Z) - Rethinking Adversarial Policies: A Generalized Attack Formulation and
Provable Defense in RL [46.32591437241358]
In this paper, we consider a multi-agent setting where a well-trained victim agent is exploited by an attacker controlling another agent.
Previous models do not account for the possibility that the attacker may only have partial control over $alpha$ or that the attack may produce easily detectable "abnormal" behaviors.
We introduce a generalized attack framework that has the flexibility to model what extent the adversary is able to control the agent.
We offer a provably efficient defense with convergence to the most robust victim policy through adversarial training with timescale separation.
arXiv Detail & Related papers (2023-05-27T02:54:07Z) - Black-Box Targeted Reward Poisoning Attack Against Online Deep
Reinforcement Learning [2.3526458707956643]
We propose the first black-box targeted attack against online deep reinforcement learning through reward poisoning during training time.
Our attack is applicable to general environments with unknown dynamics learned by unknown algorithms.
arXiv Detail & Related papers (2023-05-18T03:37:29Z) - Implicit Poisoning Attacks in Two-Agent Reinforcement Learning:
Adversarial Policies for Training-Time Attacks [21.97069271045167]
In targeted poisoning attacks, an attacker manipulates an agent-environment interaction to force the agent into adopting a policy of interest, called target policy.
We study targeted poisoning attacks in a two-agent setting where an attacker implicitly poisons the effective environment of one of the agents by modifying the policy of its peer.
We develop an optimization framework for designing optimal attacks, where the cost of the attack measures how much the solution deviates from the assumed default policy of the peer agent.
arXiv Detail & Related papers (2023-02-27T14:52:15Z) - Projective Ranking-based GNN Evasion Attacks [52.85890533994233]
Graph neural networks (GNNs) offer promising learning methods for graph-related tasks.
GNNs are at risk of adversarial attacks.
arXiv Detail & Related papers (2022-02-25T21:52:09Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z) - Automating Privilege Escalation with Deep Reinforcement Learning [71.87228372303453]
In this work, we exemplify the potential threat of malicious actors using deep reinforcement learning to train automated agents.
We present an agent that uses a state-of-the-art reinforcement learning algorithm to perform local privilege escalation.
Our agent is usable for generating realistic attack sensor data for training and evaluating intrusion detection systems.
arXiv Detail & Related papers (2021-10-04T12:20:46Z) - Policy Teaching in Reinforcement Learning via Environment Poisoning
Attacks [33.41280432984183]
We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker.
As a victim, we consider RL agents whose objective is to find a policy that maximizes reward in infinite-horizon problem settings.
arXiv Detail & Related papers (2020-11-21T16:54:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.