Semantic-level Backdoor Attack against Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2602.04898v1
- Date: Tue, 03 Feb 2026 13:23:01 GMT
- Title: Semantic-level Backdoor Attack against Text-to-Image Diffusion Models
- Authors: Tianxin Chen, Wenbo Jiang, Hongqiao Chen, Zhirun Zheng, Cheng Huang,
- Abstract summary: We propose Semantic-level Backdoor Attack (SemBD), which implants backdoors at the representation level by defining triggers as continuous semantic regions.<n>SemBD achieves a 100% attack success rate while maintaining strong robustness against state-of-the-art input-level defenses.
- Score: 3.8542429081202947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image (T2I) diffusion models are widely adopted for their strong generative capabilities, yet remain vulnerable to backdoor attacks. Existing attacks typically rely on fixed textual triggers and single-entity backdoor targets, making them highly susceptible to enumeration-based input defenses and attention-consistency detection. In this work, we propose Semantic-level Backdoor Attack (SemBD), which implants backdoors at the representation level by defining triggers as continuous semantic regions rather than discrete textual patterns. Concretely, SemBD injects semantic backdoors by distillation-based editing of the key and value projection matrices in cross-attention layers, enabling diverse prompts with identical semantic compositions to reliably activate the backdoor attack. To further enhance stealthiness, SemBD incorporates a semantic regularization to prevent unintended activation under incomplete semantics, as well as multi-entity backdoor targets that avoid highly consistent cross-attention patterns. Extensive experiments demonstrate that SemBD achieves a 100% attack success rate while maintaining strong robustness against state-of-the-art input-level defenses.
Related papers
- BadRSSD: Backdoor Attacks on Regularized Self-Supervised Diffusion Models [10.286339414754499]
Bad RSSD is the first backdoor attack targeting the representation layer of self-supervised diffusion models.<n>It hijacks the semantic representations of poisoned samples with triggers in PCA space toward those of a target image.<n>Bad RSSD substantially outperforms existing attacks in both FID and MSE metrics.
arXiv Detail & Related papers (2026-03-01T09:56:26Z) - Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion [33.35232947017276]
Transformer models are foundational to natural language processing (NLP) applications, yet remain vulnerable to backdoor attacks.<n>We introduce SteganoBackdoor, bringing stealth techniques back into line with practical threat models.<n>SteganoBackdoor achieves over 99% attack success at an order-of-magnitude lower data-poisoning rate than prior approaches.
arXiv Detail & Related papers (2025-11-18T09:56:16Z) - Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation [32.24294112337828]
BadSem is a data poisoning attack that injects backdoors by deliberately misaligning image-text pairs during training.<n>Our experiments show that BadSem achieves over 98% average ASR, generalizes well to out-of-distribution datasets, and can transfer across poisoning modalities.<n>Our findings highlight the urgent need to address semantic vulnerabilities in Vision Language Models for their safer deployment.
arXiv Detail & Related papers (2025-06-08T16:40:40Z) - SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs [57.880467106470775]
Attackers can inject imperceptible perturbations into the training data, causing the model to generate malicious, attacker-controlled captions.<n>We propose Semantic Reward Defense (SRD), a reinforcement learning framework that mitigates backdoor behavior without prior knowledge of triggers.<n>SRD uses a Deep Q-Network to learn policies for applying discrete perturbations to sensitive image regions, aiming to disrupt the activation of malicious pathways.
arXiv Detail & Related papers (2025-06-05T08:22:24Z) - Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models [70.03122709795122]
Backdoor attacks targeting text-to-image diffusion models have advanced rapidly.<n>Current backdoor samples often exhibit two key abnormalities compared to benign samples.<n>We propose Trigger without Trace (TwT) by explicitly mitigating these consistencies.
arXiv Detail & Related papers (2025-03-22T10:41:46Z) - Reformulation is All You Need: Addressing Malicious Text Features in DNNs [53.45564571192014]
We propose a unified and adaptive defense framework that is effective against both adversarial and backdoor attacks.<n>Our framework outperforms existing sample-oriented defense baselines across a diverse range of malicious textual features.
arXiv Detail & Related papers (2025-02-02T03:39:43Z) - Defending Large Language Models against Jailbreak Attacks via Semantic
Smoothing [107.97160023681184]
Aligned large language models (LLMs) are vulnerable to jailbreaking attacks.
We propose SEMANTICSMOOTH, a smoothing-based defense that aggregates predictions of semantically transformed copies of a given input prompt.
arXiv Detail & Related papers (2024-02-25T20:36:03Z) - On the Difficulty of Defending Contrastive Learning against Backdoor
Attacks [58.824074124014224]
We show how contrastive backdoor attacks operate through distinctive mechanisms.
Our findings highlight the need for defenses tailored to the specificities of contrastive backdoor attacks.
arXiv Detail & Related papers (2023-12-14T15:54:52Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.