DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models
- URL: http://arxiv.org/abs/2402.02739v1
- Date: Mon, 5 Feb 2024 05:46:31 GMT
- Title: DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models
- Authors: Yang Sui, Huy Phan, Jinqi Xiao, Tianfang Zhang, Zijie Tang, Cong Shi,
Yan Wang, Yingying Chen, Bo Yuan
- Abstract summary: Some pioneering works have shown the vulnerability of the diffusion model against backdoor attacks.
In this paper, for the first time, we explore the detectability of the poisoned noise input for the backdoored diffusion models.
We propose a low-cost trigger detection mechanism that can effectively identify the poisoned input noise.
We then take a further step to study the same problem from the attack side, proposing a backdoor attack strategy that can learn the unnoticeable trigger.
- Score: 23.502100653704446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the exciting generative AI era, the diffusion model has emerged as a very
powerful and widely adopted content generation and editing tool for various
data modalities, making the study of their potential security risks very
necessary and critical. Very recently, some pioneering works have shown the
vulnerability of the diffusion model against backdoor attacks, calling for
in-depth analysis and investigation of the security challenges of this popular
and fundamental AI technique.
In this paper, for the first time, we systematically explore the
detectability of the poisoned noise input for the backdoored diffusion models,
an important performance metric yet little explored in the existing works.
Starting from the perspective of a defender, we first analyze the properties of
the trigger pattern in the existing diffusion backdoor attacks, discovering the
important role of distribution discrepancy in Trojan detection. Based on this
finding, we propose a low-cost trigger detection mechanism that can effectively
identify the poisoned input noise. We then take a further step to study the
same problem from the attack side, proposing a backdoor attack strategy that
can learn the unnoticeable trigger to evade our proposed detection scheme.
Empirical evaluations across various diffusion models and datasets
demonstrate the effectiveness of the proposed trigger detection and
detection-evading attack strategy. For trigger detection, our distribution
discrepancy-based solution can achieve a 100\% detection rate for the Trojan
triggers used in the existing works. For evading trigger detection, our
proposed stealthy trigger design approach performs end-to-end learning to make
the distribution of poisoned noise input approach that of benign noise,
enabling nearly 100\% detection pass rate with very high attack and benign
performance for the backdoored diffusion models.
Related papers
- Twin Trigger Generative Networks for Backdoor Attacks against Object Detection [14.578800906364414]
Object detectors, which are widely used in real-world applications, are vulnerable to backdoor attacks.
Most research on backdoor attacks has focused on image classification, with limited investigation into object detection.
We propose novel twin trigger generative networks to generate invisible triggers for implanting backdoors into models during training, and visible triggers for steady activation during inference.
arXiv Detail & Related papers (2024-11-23T03:46:45Z) - Attention Tracker: Detecting Prompt Injection Attacks in LLMs [62.247841717696765]
Large Language Models (LLMs) have revolutionized various domains but remain vulnerable to prompt injection attacks.
We introduce the concept of the distraction effect, where specific attention heads shift focus from the original instruction to the injected instruction.
We propose Attention Tracker, a training-free detection method that tracks attention patterns on instruction to detect prompt injection attacks.
arXiv Detail & Related papers (2024-11-01T04:05:59Z) - Mitigating Deep Reinforcement Learning Backdoors in the Neural Activation Space [0.24578723416255752]
This paper investigates the threat of backdoors in Deep Reinforcement Learning (DRL) agent policies.
It proposes a novel method for their detection at runtime.
arXiv Detail & Related papers (2024-07-21T13:48:23Z) - Diffusion-based Adversarial Purification for Intrusion Detection [0.6990493129893112]
crafted perturbations mislead ML models, enabling attackers to evade detection or trigger false alerts.
adversarial purification has emerged as a compelling solution, particularly with diffusion models showing promising results.
This paper demonstrates the effectiveness of diffusion models in purifying adversarial examples in network intrusion detection.
arXiv Detail & Related papers (2024-06-25T14:48:28Z) - Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models [65.30406788716104]
This work investigates the vulnerabilities of security-enhancing diffusion models.
We demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack.
Case studies show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models.
arXiv Detail & Related papers (2024-06-14T02:39:43Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Model X-ray:Detecting Backdoored Models via Decision Boundary [62.675297418960355]
Backdoor attacks pose a significant security vulnerability for deep neural networks (DNNs)
We propose Model X-ray, a novel backdoor detection approach based on the analysis of illustrated two-dimensional (2D) decision boundaries.
Our approach includes two strategies focused on the decision areas dominated by clean samples and the concentration of label distribution.
arXiv Detail & Related papers (2024-02-27T12:42:07Z) - Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method
Perspective [65.70799289211868]
We introduce two new theory-driven trigger pattern generation methods specialized for dataset distillation.
We show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation.
arXiv Detail & Related papers (2023-11-28T09:53:05Z) - Leveraging Diffusion-Based Image Variations for Robust Training on
Poisoned Data [26.551317580666353]
Backdoor attacks pose a serious security threat for training neural networks.
We propose a novel approach that enables model training on potentially poisoned datasets by utilizing the power of recent diffusion models.
arXiv Detail & Related papers (2023-10-10T07:25:06Z) - Confidence-driven Sampling for Backdoor Attacks [49.72680157684523]
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios.
Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples.
We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks.
arXiv Detail & Related papers (2023-10-08T18:57:36Z) - Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly
Detection [16.010654200489913]
This paper proposes a new defense against neural network backdooring attacks.
It is based on the intuition that the feature extraction layers of a backdoored network embed new features to detect the presence of a trigger.
To detect backdoors, the proposed defense uses two synergistic anomaly detectors trained on clean validation data.
arXiv Detail & Related papers (2020-11-04T20:33:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.