A Temporal-Pattern Backdoor Attack to Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2205.02589v1
- Date: Thu, 5 May 2022 12:03:09 GMT
- Title: A Temporal-Pattern Backdoor Attack to Deep Reinforcement Learning
- Authors: Yinbo Yu, Jiajia Liu, Shouqing Li, Kepu Huang, Xudong Feng
- Abstract summary: We propose a novel temporal-pattern backdoor attack to DRL.
We validate our proposed backdoor attack to a typical job scheduling task in cloud computing.
Our backdoor's average clean data accuracy and attack success rate can reach 97.8% and 97.5%, respectively.
- Score: 10.162123678104917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (DRL) has made significant achievements in many
real-world applications. But these real-world applications typically can only
provide partial observations for making decisions due to occlusions and noisy
sensors. However, partial state observability can be used to hide malicious
behaviors for backdoors. In this paper, we explore the sequential nature of DRL
and propose a novel temporal-pattern backdoor attack to DRL, whose trigger is a
set of temporal constraints on a sequence of observations rather than a single
observation, and effect can be kept in a controllable duration rather than in
the instant. We validate our proposed backdoor attack to a typical job
scheduling task in cloud computing. Numerous experimental results show that our
backdoor can achieve excellent effectiveness, stealthiness, and sustainability.
Our backdoor's average clean data accuracy and attack success rate can reach
97.8% and 97.5%, respectively.
Related papers
- Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models [74.1970982768771]
We show that well-established data-poisoning pipelines can successfully implant backdoors into MDLMs.<n>We introduce a backdoor defense framework for MDLMs named DiSP (Diffusion Self-Purification)
arXiv Detail & Related papers (2026-02-24T15:47:52Z) - Persistent Backdoor Attacks under Continual Fine-Tuning of LLMs [33.568493008851746]
We study whether and how implanted backdoors persist through a multi-stage post-deployment fine-tuning.<n>We propose P-Trojan, a trigger-based attack algorithm that explicitly optimize for backdoor persistence across repeated updates.
arXiv Detail & Related papers (2025-12-12T11:40:51Z) - Backdoor Unlearning by Linear Task Decomposition [69.91984435094157]
Foundation models are highly susceptible to adversarial perturbations and targeted backdoor attacks.<n>Existing backdoor removal approaches rely on costly fine-tuning to override the harmful behavior.<n>This raises the question of whether backdoors can be removed without compromising the general capabilities of the models.
arXiv Detail & Related papers (2025-10-16T16:18:07Z) - Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning [2.8311497176067104]
Deep Reinforcement Learning (DRL) systems are increasingly used in safety-critical applications, yet their security remains severely underexplored.<n>This work investigates backdoor attacks, which implant hidden triggers that cause malicious actions only when specific inputs appear in the observation space.<n>We introduce two novel attacks: (1) TrojanentRL, which exploits component-level flaws to implant a persistent backdoor that survives full model retraining; and (2) InfrectroRL, a post-training backdoor attack which requires no access to training, validation, nor test data.
arXiv Detail & Related papers (2025-07-07T11:15:54Z) - Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations [50.1394620328318]
Existing backdoor attacks mainly focus on balanced datasets.
We propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$2$AO)
Our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.
arXiv Detail & Related papers (2024-10-16T18:44:22Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks [26.24490960002264]
We propose a general and effective loss function DeCE (Deceptive Cross-Entropy) to enhance the security of Code Language Models.
Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE.
arXiv Detail & Related papers (2024-07-12T03:18:38Z) - Revisiting Backdoor Attacks against Large Vision-Language Models [76.42014292255944]
This paper empirically examines the generalizability of backdoor attacks during the instruction tuning of LVLMs.
We modify existing backdoor attacks based on the above key observations.
This paper underscores that even simple traditional backdoor strategies pose a serious threat to LVLMs.
arXiv Detail & Related papers (2024-06-27T02:31:03Z) - BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models [57.5404308854535]
Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions.
We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relatively uniform drifts in the model's embedding space.
Our bi-level optimization method identifies universal embedding perturbations that elicit unwanted behaviors and adjusts the model parameters to reinforce safe behaviors against these perturbations.
arXiv Detail & Related papers (2024-06-24T19:29:47Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Confidence Matters: Inspecting Backdoors in Deep Neural Networks via
Distribution Transfer [27.631616436623588]
We propose a backdoor defense DTInspector built upon a new observation.
DTInspector learns a patch that could change the predictions of most high-confidence data, and then decides the existence of backdoor.
arXiv Detail & Related papers (2022-08-13T08:16:28Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.