Reward Delay Attacks on Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2209.03540v1
- Date: Thu, 8 Sep 2022 02:40:44 GMT
- Title: Reward Delay Attacks on Deep Reinforcement Learning
- Authors: Anindya Sarkar, Jiarui Feng, Yevgeniy Vorobeychik, Christopher Gill,
and Ning Zhang
- Abstract summary: We present novel attacks targeting Q-learning that exploit a vulnerability entailed by this assumption.
We consider two types of attack goals: targeted attacks, which aim to cause a target policy to be learned, and untargeted attacks, which simply aim to induce a policy with a low reward.
- Score: 26.563537078924835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most reinforcement learning algorithms implicitly assume strong synchrony. We
present novel attacks targeting Q-learning that exploit a vulnerability
entailed by this assumption by delaying the reward signal for a limited time
period. We consider two types of attack goals: targeted attacks, which aim to
cause a target policy to be learned, and untargeted attacks, which simply aim
to induce a policy with a low reward. We evaluate the efficacy of the proposed
attacks through a series of experiments. Our first observation is that
reward-delay attacks are extremely effective when the goal is simply to
minimize reward. Indeed, we find that even naive baseline reward-delay attacks
are also highly successful in minimizing the reward. Targeted attacks, on the
other hand, are more challenging, although we nevertheless demonstrate that the
proposed approaches remain highly effective at achieving the attacker's
targets. In addition, we introduce a second threat model that captures a
minimal mitigation that ensures that rewards cannot be used out of sequence. We
find that this mitigation remains insufficient to ensure robustness to attacks
that delay, but preserve the order, of rewards.
Related papers
- Efficient Adversarial Attacks on Online Multi-agent Reinforcement
Learning [45.408568528354216]
We investigate the impact of adversarial attacks on multi-agent reinforcement learning (MARL)
In the considered setup, there is an attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them.
We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents' algorithms.
arXiv Detail & Related papers (2023-07-15T00:38:55Z) - Guidance Through Surrogate: Towards a Generic Diagnostic Attack [101.36906370355435]
We develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA)
Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size.
More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
arXiv Detail & Related papers (2022-12-30T18:45:23Z) - Object-fabrication Targeted Attack for Object Detection [54.10697546734503]
adversarial attack for object detection contains targeted attack and untargeted attack.
New object-fabrication targeted attack mode can mislead detectors tofabricate extra false objects with specific target labels.
arXiv Detail & Related papers (2022-12-13T08:42:39Z) - Deep-Attack over the Deep Reinforcement Learning [26.272161868927004]
adversarial attack developments have made reinforcement learning more vulnerable.
We propose a reinforcement learning-based attacking framework by considering the effectiveness and stealthy spontaneously.
We also propose a new metric to evaluate the performance of the attack model in these two aspects.
arXiv Detail & Related papers (2022-05-02T10:58:19Z) - Defense Against Reward Poisoning Attacks in Reinforcement Learning [29.431349181232203]
We study defense strategies against reward poisoning attacks in reinforcement learning.
We propose an optimization framework for deriving optimal defense policies.
We show that defense policies that are solutions to the proposed optimization problems have provable performance guarantees.
arXiv Detail & Related papers (2021-02-10T23:31:53Z) - On Success and Simplicity: A Second Look at Transferable Targeted
Attacks [6.276791657895803]
We show that transferable targeted attacks converge slowly to optimal transferability and improve considerably when given more iterations.
An attack that simply maximizes the target logit performs surprisingly well, surpassing more complex losses and even achieving performance comparable to the state of the art.
arXiv Detail & Related papers (2020-12-21T09:41:29Z) - Guided Adversarial Attack for Evaluating and Enhancing Adversarial
Defenses [59.58128343334556]
We introduce a relaxation term to the standard loss, that finds more suitable gradient-directions, increases attack efficacy and leads to more efficient adversarial training.
We propose Guided Adversarial Margin Attack (GAMA), which utilizes function mapping of the clean image to guide the generation of adversaries.
We also propose Guided Adversarial Training (GAT), which achieves state-of-the-art performance amongst single-step defenses.
arXiv Detail & Related papers (2020-11-30T16:39:39Z) - Robust Tracking against Adversarial Attacks [69.59717023941126]
We first attempt to generate adversarial examples on top of video sequences to improve the tracking robustness against adversarial attacks.
We apply the proposed adversarial attack and defense approaches to state-of-the-art deep tracking algorithms.
arXiv Detail & Related papers (2020-07-20T08:05:55Z) - AdvMind: Inferring Adversary Intent of Black-Box Attacks [66.19339307119232]
We present AdvMind, a new class of estimation models that infer the adversary intent of black-box adversarial attacks in a robust manner.
On average AdvMind detects the adversary intent with over 75% accuracy after observing less than 3 query batches.
arXiv Detail & Related papers (2020-06-16T22:04:31Z) - Deflecting Adversarial Attacks [94.85315681223702]
We present a new approach towards ending this cycle where we "deflect" adversarial attacks by causing the attacker to produce an input that resembles the attack's target class.
We first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance.
arXiv Detail & Related papers (2020-02-18T06:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.