Related papers: TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

URL: http://arxiv.org/abs/2409.05294v1
Date: Mon, 9 Sep 2024 03:02:16 GMT
Title: TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors
Authors: Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, Yisen Wang,
Abstract summary: Diffusion models are vulnerable to backdoor attacks that compromise their integrity. We propose TERD, a backdoor defense framework that builds unified modeling for current attacks. TERD secures a 100% True Positive Rate (TPR) and True Negative Rate (TNR) across datasets of varying resolutions.
Score: 36.07978634674072
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have achieved notable success in image generation, but they remain highly vulnerable to backdoor attacks, which compromise their integrity by producing specific undesirable outputs when presented with a pre-defined trigger. In this paper, we investigate how to protect diffusion models from this dangerous threat. Specifically, we propose TERD, a backdoor defense framework that builds unified modeling for current attacks, which enables us to derive an accessible reversed loss. A trigger reversion strategy is further employed: an initial approximation of the trigger through noise sampled from a prior distribution, followed by refinement through differential multi-step samplers. Additionally, with the reversed trigger, we propose backdoor detection from the noise space, introducing the first backdoor input detection approach for diffusion models and a novel model detection algorithm that calculates the KL divergence between reversed and benign distributions. Extensive evaluations demonstrate that TERD secures a 100% True Positive Rate (TPR) and True Negative Rate (TNR) across datasets of varying resolutions. TERD also demonstrates nice adaptability to other Stochastic Differential Equation (SDE)-based models. Our code is available at https://github.com/PKU-ML/TERD.

Related papers

Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models [74.1970982768771]
We show that well-established data-poisoning pipelines can successfully implant backdoors into MDLMs.<n>We introduce a backdoor defense framework for MDLMs named DiSP (Diffusion Self-Purification)
arXiv Detail & Related papers (2026-02-24T15:47:52Z)
DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion [0.7351161122478707]
Deep neural networks are vulnerable to Trojan (backdoor) attacks.<n> triggerAdaptive inversion reconstructs malicious "shortcut" patterns inserted by an adversary during training.<n>We propose a data-free, zero-shot trigger-inversion strategy that restricts the search space while avoiding strong assumptions on trigger appearance.
arXiv Detail & Related papers (2025-07-30T16:31:13Z)
BURN: Backdoor Unlearning via Adversarial Boundary Analysis [73.14147934175604]
Backdoor unlearning aims to remove backdoor-related information while preserving the model's original functionality.<n>We propose Backdoor Unlearning via adversaRial bouNdary analysis (BURN), a novel defense framework that integrates false correlation decoupling, progressive data refinement, and model purification.
arXiv Detail & Related papers (2025-07-14T17:13:06Z)
MixBridge: Heterogeneous Image-to-Image Backdoor Attack through Mixture of Schrödinger Bridges [90.49625209112223]
MixBridge is a novel diffusion Schr"odinger bridge (DSB) framework to cater to arbitrary input distributions.<n>We show that backdoor triggers can be injected into MixBridge by directly training with poisoned image pairs.<n>We propose a Divide-and-Merge strategy to mix different bridges, where models are independently pre-trained for each specific objective.
arXiv Detail & Related papers (2025-05-12T06:40:23Z)
Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model [70.03122709795122]
Backdoor attacks targeting text-to-image diffusion models have advanced rapidly. Current backdoor samples often exhibit two key abnormalities compared to benign samples. We propose a novel Invisible Backdoor Attack (IBA) to enhance the stealthiness of backdoor samples.
arXiv Detail & Related papers (2025-03-22T10:41:46Z)
One-for-More: Continual Diffusion Model for Anomaly Detection [61.12622458367425]
Anomaly detection methods utilize diffusion models to generate or reconstruct normal samples when given arbitrary anomaly images. Our study found that the diffusion model suffers from severe faithfulness hallucination'' and catastrophic forgetting'' We propose a continual diffusion model that uses gradient projection to achieve stable continual learning.
arXiv Detail & Related papers (2025-02-27T07:47:27Z)
A Dual-Purpose Framework for Backdoor Defense and Backdoor Amplification in Diffusion Models [5.957580737396457]
PureDiffusion is a dual-purpose framework that simultaneously serves two contrasting roles: backdoor defense and backdoor attack amplification. For defense, we introduce two novel loss functions to invert backdoor triggers embedded in diffusion models. For attack amplification, we describe how our trigger inversion algorithm can be used to reinforce the original trigger embedded in the backdoored diffusion model.
arXiv Detail & Related papers (2025-02-26T11:01:43Z)
How to Backdoor Consistency Models? [10.977907906989342]
We conduct the first study on the vulnerability of consistency models to backdoor attacks. Our framework successfully compromises the consistency models while maintaining high utility and specificity.
arXiv Detail & Related papers (2024-10-14T22:25:06Z)
Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models [3.134071086568745]
Diffusion models (DMs) are regarded as one of the most advanced generative models today. Recent studies suggest that DMs are vulnerable to backdoor attacks. This vulnerability poses substantial risks, including reputational damage to model owners. We introduce Diff-Cleanse, a novel two-stage backdoor defense framework specifically designed for DMs.
arXiv Detail & Related papers (2024-07-31T03:54:41Z)
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks. We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger. For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z)
Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models [65.30406788716104]
This work investigates the vulnerabilities of security-enhancing diffusion models. We demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack. Case studies show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models.
arXiv Detail & Related papers (2024-06-14T02:39:43Z)
Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal. Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths. Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z)
DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models [23.502100653704446]
Some pioneering works have shown the vulnerability of the diffusion model against backdoor attacks. In this paper, for the first time, we explore the detectability of the poisoned noise input for the backdoored diffusion models. We propose a low-cost trigger detection mechanism that can effectively identify the poisoned input noise. We then take a further step to study the same problem from the attack side, proposing a backdoor attack strategy that can learn the unnoticeable trigger.
arXiv Detail & Related papers (2024-02-05T05:46:31Z)
Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples [67.66153875643964]
Backdoor attacks are serious security threats to machine learning models. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk.
arXiv Detail & Related papers (2023-07-20T03:56:04Z)
How to Backdoor Diffusion Models? [74.43215520371506]
This paper presents the first study on the robustness of diffusion models against backdoor attacks. We propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. Our results call attention to potential risks and possible misuse of diffusion models.
arXiv Detail & Related papers (2022-12-11T03:44:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.