Adaptive Reward-Poisoning Attacks against Reinforcement Learning
- URL: http://arxiv.org/abs/2003.12613v2
- Date: Mon, 22 Jun 2020 21:02:31 GMT
- Title: Adaptive Reward-Poisoning Attacks against Reinforcement Learning
- Authors: Xuezhou Zhang, Yuzhe Ma, Adish Singla, Xiaojin Zhu
- Abstract summary: In reward-poisoning attacks against reinforcement learning, an attacker can perturb the environment reward $r_t$ into $r_t+delta_t$ at each step.
We show that under mild conditions, adaptive attacks can achieve the nefarious policy in steps in state-space size $|S|$.
We also show that an attacker can find effective reward-poisoning attacks using state-of-the-art deep RL techniques.
- Score: 43.07944714475278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In reward-poisoning attacks against reinforcement learning (RL), an attacker
can perturb the environment reward $r_t$ into $r_t+\delta_t$ at each step, with
the goal of forcing the RL agent to learn a nefarious policy. We categorize
such attacks by the infinity-norm constraint on $\delta_t$: We provide a lower
threshold below which reward-poisoning attack is infeasible and RL is certified
to be safe; we provide a corresponding upper threshold above which the attack
is feasible. Feasible attacks can be further categorized as non-adaptive where
$\delta_t$ depends only on $(s_t,a_t, s_{t+1})$, or adaptive where $\delta_t$
depends further on the RL agent's learning process at time $t$. Non-adaptive
attacks have been the focus of prior works. However, we show that under mild
conditions, adaptive attacks can achieve the nefarious policy in steps
polynomial in state-space size $|S|$, whereas non-adaptive attacks require
exponential steps. We provide a constructive proof that a Fast Adaptive Attack
strategy achieves the polynomial rate. Finally, we show that empirically an
attacker can find effective reward-poisoning attacks using state-of-the-art
deep RL techniques.
Related papers
- Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning [49.48615590763914]
We propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation.
We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.
arXiv Detail & Related papers (2024-11-20T08:20:29Z) - Optimal Attack and Defense for Reinforcement Learning [11.36770403327493]
In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment.
We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward.
We argue that the optimal defense policy for the victim can be computed as the solution to a Stackelberg game.
arXiv Detail & Related papers (2023-11-30T21:21:47Z) - Guidance Through Surrogate: Towards a Generic Diagnostic Attack [101.36906370355435]
We develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA)
Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size.
More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
arXiv Detail & Related papers (2022-12-30T18:45:23Z) - Understanding the Limits of Poisoning Attacks in Episodic Reinforcement
Learning [36.30086280732181]
This paper studies poisoning attacks to manipulate emphany order-optimal learning algorithm towards a targeted policy in episodic RL.
We find that the effect of attacks crucially depend on whether the rewards are bounded or unbounded.
arXiv Detail & Related papers (2022-08-29T15:10:14Z) - Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation
and Complexity Analysis [20.11993437283895]
This paper provides a game-theoretical underpinning for understanding this type of security risk.
We define the sampling attack model as a Stackelberg game between the attacker and the agent, which yields a minimax formulation.
We observe that a minor effort of the attacker can significantly deteriorate the learning performance.
arXiv Detail & Related papers (2022-07-29T21:29:29Z) - Provably Efficient Black-Box Action Poisoning Attacks Against
Reinforcement Learning [41.1063033715314]
We introduce a new class of attacks named action poisoning attacks, where an adversary can change the action signal selected by the agent.
Compared with existing attack models, the attacker's ability in the proposed action poisoning attack model is more restricted.
We show that, even in the black-box setting, the proposed LCB-H attack scheme can force the UCB-H agent to choose actions according to the policy selected by the attacker.
arXiv Detail & Related papers (2021-10-09T06:41:34Z) - PDPGD: Primal-Dual Proximal Gradient Descent Adversarial Attack [92.94132883915876]
State-of-the-art deep neural networks are sensitive to small input perturbations.
Many defence methods have been proposed that attempt to improve robustness to adversarial noise.
evaluating adversarial robustness has proven to be extremely challenging.
arXiv Detail & Related papers (2021-06-03T01:45:48Z) - Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models.
In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms.
CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z) - RayS: A Ray Searching Method for Hard-label Adversarial Attack [99.72117609513589]
We present the Ray Searching attack (RayS), which greatly improves the hard-label attack effectiveness as well as efficiency.
RayS attack can also be used as a sanity check for possible "falsely robust" models.
arXiv Detail & Related papers (2020-06-23T07:01:50Z) - Policy Teaching via Environment Poisoning: Training-time Adversarial
Attacks against Reinforcement Learning [33.41280432984183]
We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy.
As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings.
arXiv Detail & Related papers (2020-03-28T23:22:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.