Provably Efficient Black-Box Action Poisoning Attacks Against
Reinforcement Learning
- URL: http://arxiv.org/abs/2110.04471v1
- Date: Sat, 9 Oct 2021 06:41:34 GMT
- Title: Provably Efficient Black-Box Action Poisoning Attacks Against
Reinforcement Learning
- Authors: Guanlin Liu and Lifeng Lai
- Abstract summary: We introduce a new class of attacks named action poisoning attacks, where an adversary can change the action signal selected by the agent.
Compared with existing attack models, the attacker's ability in the proposed action poisoning attack model is more restricted.
We show that, even in the black-box setting, the proposed LCB-H attack scheme can force the UCB-H agent to choose actions according to the policy selected by the attacker.
- Score: 41.1063033715314
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the broad range of applications of reinforcement learning (RL),
understanding the effects of adversarial attacks against RL model is essential
for the safe applications of this model. Prior works on adversarial attacks
against RL mainly focus on either observation poisoning attacks or environment
poisoning attacks. In this paper, we introduce a new class of attacks named
action poisoning attacks, where an adversary can change the action signal
selected by the agent. Compared with existing attack models, the attacker's
ability in the proposed action poisoning attack model is more restricted, and
hence the attack model is more practical. We study the action poisoning attack
in both white-box and black-box settings. We introduce an adaptive attack
scheme called LCB-H, which works for most RL agents in the black-box setting.
We prove that the LCB-H attack can force any efficient RL agent, whose dynamic
regret scales sublinearly with the total number of steps taken, to choose
actions according to a policy selected by the attacker very frequently, with
only sublinear cost. In addition, we apply LCB-H attack against a popular
model-free RL algorithm: UCB-H. We show that, even in the black-box setting, by
spending only logarithm cost, the proposed LCB-H attack scheme can force the
UCB-H agent to choose actions according to the policy selected by the attacker
very frequently.
Related papers
- Learning diverse attacks on large language models for robust red-teaming and safety tuning [126.32539952157083]
Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe deployment of large language models.
We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks.
We propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts.
arXiv Detail & Related papers (2024-05-28T19:16:17Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning [4.629358641630161]
We study the problem of universal black-boxed reward poisoning attacks against general offline reinforcement learning with deep neural networks.
We propose the first universal black-box reward poisoning attack in the general offline RL setting.
arXiv Detail & Related papers (2024-02-15T04:08:49Z) - Efficient Adversarial Attacks on Online Multi-agent Reinforcement
Learning [45.408568528354216]
We investigate the impact of adversarial attacks on multi-agent reinforcement learning (MARL)
In the considered setup, there is an attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them.
We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents' algorithms.
arXiv Detail & Related papers (2023-07-15T00:38:55Z) - Implicit Poisoning Attacks in Two-Agent Reinforcement Learning:
Adversarial Policies for Training-Time Attacks [21.97069271045167]
In targeted poisoning attacks, an attacker manipulates an agent-environment interaction to force the agent into adopting a policy of interest, called target policy.
We study targeted poisoning attacks in a two-agent setting where an attacker implicitly poisons the effective environment of one of the agents by modifying the policy of its peer.
We develop an optimization framework for designing optimal attacks, where the cost of the attack measures how much the solution deviates from the assumed default policy of the peer agent.
arXiv Detail & Related papers (2023-02-27T14:52:15Z) - Guidance Through Surrogate: Towards a Generic Diagnostic Attack [101.36906370355435]
We develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA)
Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size.
More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
arXiv Detail & Related papers (2022-12-30T18:45:23Z) - Query Efficient Cross-Dataset Transferable Black-Box Attack on Action
Recognition [99.29804193431823]
Black-box adversarial attacks present a realistic threat to action recognition systems.
We propose a new attack on action recognition that addresses these shortcomings by generating perturbations.
Our method achieves 8% and higher 12% deception rates compared to state-of-the-art query-based and transfer-based attacks.
arXiv Detail & Related papers (2022-11-23T17:47:49Z) - Adaptive Reward-Poisoning Attacks against Reinforcement Learning [43.07944714475278]
In reward-poisoning attacks against reinforcement learning, an attacker can perturb the environment reward $r_t$ into $r_t+delta_t$ at each step.
We show that under mild conditions, adaptive attacks can achieve the nefarious policy in steps in state-space size $|S|$.
We also show that an attacker can find effective reward-poisoning attacks using state-of-the-art deep RL techniques.
arXiv Detail & Related papers (2020-03-27T19:46:23Z) - Action-Manipulation Attacks Against Stochastic Bandits: Attacks and
Defense [45.408568528354216]
We introduce a new class of attack named action-manipulation attack.
In this attack, an adversary can change the action signal selected by the user.
To defend against this class of attacks, we introduce a novel algorithm that is robust to action-manipulation attacks.
arXiv Detail & Related papers (2020-02-19T04:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.