Related papers: Adaptive Reward-Poisoning Attacks against Reinforcement Learning

Adaptive Reward-Poisoning Attacks against Reinforcement Learning

URL: http://arxiv.org/abs/2003.12613v2
Date: Mon, 22 Jun 2020 21:02:31 GMT
Title: Adaptive Reward-Poisoning Attacks against Reinforcement Learning
Authors: Xuezhou Zhang, Yuzhe Ma, Adish Singla, Xiaojin Zhu
Abstract summary: In reward-poisoning attacks against reinforcement learning, an attacker can perturb the environment reward $r_t$ into $r_t+delta_t$ at each step. We show that under mild conditions, adaptive attacks can achieve the nefarious policy in steps in state-space size $|S|$. We also show that an attacker can find effective reward-poisoning attacks using state-of-the-art deep RL techniques.
Score: 43.07944714475278
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In reward-poisoning attacks against reinforcement learning (RL), an attacker can perturb the environment reward $r_t$ into $r_t+\delta_t$ at each step, with the goal of forcing the RL agent to learn a nefarious policy. We categorize such attacks by the infinity-norm constraint on $\delta_t$: We provide a lower threshold below which reward-poisoning attack is infeasible and RL is certified to be safe; we provide a corresponding upper threshold above which the attack is feasible. Feasible attacks can be further categorized as non-adaptive where $\delta_t$ depends only on $(s_t,a_t, s_{t+1})$, or adaptive where $\delta_t$ depends further on the RL agent's learning process at time $t$. Non-adaptive attacks have been the focus of prior works. However, we show that under mild conditions, adaptive attacks can achieve the nefarious policy in steps polynomial in state-space size $|S|$, whereas non-adaptive attacks require exponential steps. We provide a constructive proof that a Fast Adaptive Attack strategy achieves the polynomial rate. Finally, we show that empirically an attacker can find effective reward-poisoning attacks using state-of-the-art deep RL techniques.

Related papers

Fox in the Henhouse: Supply-Chain Backdoor Attacks Against Reinforcement Learning [20.41701122824956]
Current state-of-the-art backdoor attacks against Reinforcement Learning (RL) rely upon unrealistically permissive access models.<n>We propose the underlineSupply-underlineChunderlineain underlineBackdoor (SCAB) attack.<n>Our attack can successfully activate over $90%$ of triggered actions, reducing the average episodic return by $80%$ for the victim.
arXiv Detail & Related papers (2025-05-26T05:39:35Z)
Fast Proxies for LLM Robustness Evaluation [48.53873823665833]
We compare the ability of fast proxy metrics to predict the real-world robustness of an LLM against a simulated attacker ensemble. This allows us to estimate a model's robustness to computationally expensive attacks without requiring runs of the attacks themselves.
arXiv Detail & Related papers (2025-02-14T11:15:27Z)
Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning [49.48615590763914]
We propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation. We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.
arXiv Detail & Related papers (2024-11-20T08:20:29Z)
Optimal Attack and Defense for Reinforcement Learning [11.36770403327493]
In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment. We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward. We argue that the optimal defense policy for the victim can be computed as the solution to a Stackelberg game.
arXiv Detail & Related papers (2023-11-30T21:21:47Z)
Guidance Through Surrogate: Towards a Generic Diagnostic Attack [101.36906370355435]
We develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA) Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
arXiv Detail & Related papers (2022-12-30T18:45:23Z)
Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning [36.30086280732181]
This paper studies poisoning attacks to manipulate emphany order-optimal learning algorithm towards a targeted policy in episodic RL. We find that the effect of attacks crucially depend on whether the rewards are bounded or unbounded.
arXiv Detail & Related papers (2022-08-29T15:10:14Z)
Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation and Complexity Analysis [20.11993437283895]
This paper provides a game-theoretical underpinning for understanding this type of security risk. We define the sampling attack model as a Stackelberg game between the attacker and the agent, which yields a minimax formulation. We observe that a minor effort of the attacker can significantly deteriorate the learning performance.
arXiv Detail & Related papers (2022-07-29T21:29:29Z)
Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning [41.1063033715314]
We introduce a new class of attacks named action poisoning attacks, where an adversary can change the action signal selected by the agent. Compared with existing attack models, the attacker's ability in the proposed action poisoning attack model is more restricted. We show that, even in the black-box setting, the proposed LCB-H attack scheme can force the UCB-H agent to choose actions according to the policy selected by the attacker.
arXiv Detail & Related papers (2021-10-09T06:41:34Z)
PDPGD: Primal-Dual Proximal Gradient Descent Adversarial Attack [92.94132883915876]
State-of-the-art deep neural networks are sensitive to small input perturbations. Many defence methods have been proposed that attempt to improve robustness to adversarial noise. evaluating adversarial robustness has proven to be extremely challenging.
arXiv Detail & Related papers (2021-06-03T01:45:48Z)
Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models. In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms. CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z)
RayS: A Ray Searching Method for Hard-label Adversarial Attack [99.72117609513589]
We present the Ray Searching attack (RayS), which greatly improves the hard-label attack effectiveness as well as efficiency. RayS attack can also be used as a sanity check for possible "falsely robust" models.
arXiv Detail & Related papers (2020-06-23T07:01:50Z)
Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning [33.41280432984183]
We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy. As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings.
arXiv Detail & Related papers (2020-03-28T23:22:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.