Related papers: BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models

BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models

URL: http://arxiv.org/abs/2508.03221v1
Date: Tue, 05 Aug 2025 08:48:37 GMT
Title: BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models
Authors: Yu Pan, Jiahao Chen, Lin Wang, Bingrong Dai, Yi Du,
Abstract summary: BadBlocks is a novel type of backdoor threat that is more lightweight and covert than existing approaches.<n>It successfully injects backdoors and evades the most advanced defense frameworks.<n>BadBlocks achieves a high attack success rate (ASR) and low perceptual quality loss (as measured by FID Score)
Score: 8.956002785457606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years,Diffusion models have achieved remarkable progress in the field of image generation.However,recent studies have shown that diffusion models are susceptible to backdoor attacks,in which attackers can manipulate the output by injecting covert triggers such as specific visual patterns or textual phrases into the training dataset.Fortunately,with the continuous advancement of defense techniques,defenders have become increasingly capable of identifying and mitigating most backdoor attacks using visual inspection and neural network-based detection methods.However,in this paper,we identify a novel type of backdoor threat that is more lightweight and covert than existing approaches,which we name BadBlocks,requires only about 30\% of the computational resources and 20\% GPU time typically needed by previous backdoor attacks,yet it successfully injects backdoors and evades the most advanced defense frameworks.BadBlocks enables attackers to selectively contaminate specific blocks within the UNet architecture of diffusion models while maintaining normal functionality in the remaining components.Experimental results demonstrate that BadBlocks achieves a high attack success rate (ASR) and low perceptual quality loss (as measured by FID Score),even under extremely constrained computational resources and GPU time.Moreover,BadBlocks is able to bypass existing defense frameworks,especially the attention-based backdoor detection method, highlighting it as a novel and noteworthy threat.Ablation studies further demonstrate that effective backdoor injection does not require fine-tuning the entire network and highlight the pivotal role of certain neural network layers in backdoor mapping.Overall,BadBlocks significantly reduces the barrier to conducting backdoor attacks in all aspects.It enables attackers to inject backdoors into large-scale diffusion models even using consumer-grade GPUs.

Related papers

Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning Models [6.937795040660591]
We introduce Deferred Activated Backdoor Functionality (DABF) as a new paradigm in backdoor attacks. Unlike conventional attacks, DABF initially conceals its backdoor, producing benign outputs even when triggered. DABF attacks exploit the common practice in the life cycle of machine learning models to perform model updates and fine-tuning after initial deployment.
arXiv Detail & Related papers (2024-11-10T07:01:53Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z)
UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models [19.46962670935554]
Diffusion models are vulnerable to backdoor attacks.<n>We propose a black-box input-level backdoor detection framework on diffusion models, called UFID.<n>Our method achieves superb performance on detection effectiveness and run-time efficiency.
arXiv Detail & Related papers (2024-04-01T13:21:05Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
INK: Inheritable Natural Backdoor Attack Against Model Distillation [8.937026844871074]
We introduce INK, an inheritable natural backdoor attack that targets model distillation. INK employs image variance as a backdoor trigger and enables both clean-image and clean-label attacks. For instance, INK maintains an attack success rate of over 98% post-distillation, compared to an average success rate of 1.4% for existing methods.
arXiv Detail & Related papers (2023-04-21T14:35:47Z)
Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification. CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
Check Your Other Door! Establishing Backdoor Attacks in the Frequency Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks. We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z)
Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks. Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated. We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z)
Black-box Detection of Backdoor Attacks with Limited Information and Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs) Backdoor learning is an emerging and rapidly growing research area. This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
Dynamic Backdoor Attacks Against Machine Learning Models [28.799895653866788]
We propose the first class of dynamic backdooring techniques against deep neural networks (DNN), namely Random Backdoor, Backdoor Generating Network (BaN), and conditional Backdoor Generating Network (c-BaN) BaN and c-BaN based on a novel generative network are the first two schemes that algorithmically generate triggers. Our techniques achieve almost perfect attack performance on backdoored data with a negligible utility loss.
arXiv Detail & Related papers (2020-03-07T22:46:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.