Related papers: NaviDet: Efficient Input-level Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation

NaviDet: Efficient Input-level Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation

URL: http://arxiv.org/abs/2503.06453v1
Date: Sun, 09 Mar 2025 05:27:44 GMT
Title: NaviDet: Efficient Input-level Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation
Authors: Shengfang Zhai, Jiajun Li, Yue Liu, Huanran Chen, Zhihua Tian, Wenjie Qu, Qingni Shen, Ruoxi Jia, Yinpeng Dong, Jiaheng Zhang,
Abstract summary: NaviDet is the first general input-level backdoor detection framework for identifying backdoor inputs across various backdoor targets.<n>Our approach is based on the new observation that trigger tokens tend to induce significant neuron activation variation in the early stage of the diffusion generation process.
Score: 37.075824084492524
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In recent years, text-to-image (T2I) diffusion models have garnered significant attention for their ability to generate high-quality images reflecting text prompts. However, their growing popularity has also led to the emergence of backdoor threats, posing substantial risks. Currently, effective defense strategies against such threats are lacking due to the diversity of backdoor targets in T2I synthesis. In this paper, we propose NaviDet, the first general input-level backdoor detection framework for identifying backdoor inputs across various backdoor targets. Our approach is based on the new observation that trigger tokens tend to induce significant neuron activation variation in the early stage of the diffusion generation process, a phenomenon we term Early-step Activation Variation. Leveraging this insight, NaviDet detects malicious samples by analyzing neuron activation variations caused by input tokens. Through extensive experiments, we demonstrate the effectiveness and efficiency of our method against various T2I backdoor attacks, surpassing existing baselines with significantly lower computational overhead. Furthermore, we rigorously demonstrate that our method remains effective against potential adaptive attacks.

Related papers

Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers [0.0]
Large language models (LLMs) aligned for safety often exhibit emergent deceptive behaviors.<n>This paper introduces adversarial activation patching, a novel mechanistic interpretability framework.<n>By sourcing activations from "deceptive" prompts, we simulate vulnerabilities and quantify deception rates.
arXiv Detail & Related papers (2025-07-12T21:29:49Z)
Defending Deep Neural Networks against Backdoor Attacks via Module Switching [15.979018992591032]
An exponential increase in the parameters of Deep Neural Networks (DNNs) has significantly raised the cost of independent training.<n>Open-source models are more vulnerable to malicious threats, such as backdoor attacks.<n>We propose a novel module-switching strategy to break such spurious correlations within the model's propagation path.
arXiv Detail & Related papers (2025-04-08T11:01:07Z)
Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model [70.03122709795122]
Backdoor attacks targeting text-to-image diffusion models have advanced rapidly. Current backdoor samples often exhibit two key abnormalities compared to benign samples. We propose a novel Invisible Backdoor Attack (IBA) to enhance the stealthiness of backdoor samples.
arXiv Detail & Related papers (2025-03-22T10:41:46Z)
Improving the Transferability of Adversarial Examples by Inverse Knowledge Distillation [15.362394334872077]
Inverse Knowledge Distillation (IKD) is designed to enhance adversarial transferability effectively.<n>IKD integrates with gradient-based attack methods, promoting diversity in attack gradients and mitigating overfitting to specific model architectures.<n>Experiments on the ImageNet dataset validate the effectiveness of our approach.
arXiv Detail & Related papers (2025-02-24T09:35:30Z)
REFINE: Inversion-Free Backdoor Defense via Model Reprogramming [60.554146386198376]
Backdoor attacks on deep neural networks (DNNs) have emerged as a significant security threat.<n>We propose REFINE, an inversion-free backdoor defense method based on model reprogramming.
arXiv Detail & Related papers (2025-02-22T07:29:12Z)
Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers [1.1187085721899017]
We study the sources of known representation vulnerabilities of vision transformers (ViT), where perceptually identical images can have very different representations.<n>We develop NeuroShield-ViT, a novel defense mechanism that strategically neutralizes vulnerable neurons in earlier layers to prevent the cascade of adversarial effects.<n>Our results shed new light on how adversarial effects propagate through ViT layers, while providing a promising approach to enhance the robustness of vision transformers against adversarial attacks.
arXiv Detail & Related papers (2025-02-07T05:58:16Z)
Turning Generative Models Degenerate: The Power of Data Poisoning Attacks [10.36389246679405]
Malicious actors can introduce backdoors through poisoning attacks to generate undesirable outputs. We conduct an investigation of various poisoning techniques targeting the large language models' fine-tuning phase via the Efficient Fine-Tuning (PEFT) method. Our study presents the first systematic approach to understanding poisoning attacks targeting NLG tasks during fine-tuning via PEFT.
arXiv Detail & Related papers (2024-07-17T03:02:15Z)
T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks. We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger. For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z)
Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift [104.76588209308666]
This paper explores backdoor attacks in LVLM instruction tuning across mismatched training and testing domains. We introduce a new evaluation dimension, backdoor domain generalization, to assess attack robustness. We propose a multimodal attribution backdoor attack (MABA) that injects domain-agnostic triggers into critical areas.
arXiv Detail & Related papers (2024-06-27T02:31:03Z)
DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models [23.502100653704446]
Some pioneering works have shown the vulnerability of the diffusion model against backdoor attacks. In this paper, for the first time, we explore the detectability of the poisoned noise input for the backdoored diffusion models. We propose a low-cost trigger detection mechanism that can effectively identify the poisoned input noise. We then take a further step to study the same problem from the attack side, proposing a backdoor attack strategy that can learn the unnoticeable trigger.
arXiv Detail & Related papers (2024-02-05T05:46:31Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Personalization as a Shortcut for Few-Shot Backdoor Attack against Text-to-Image Diffusion Models [23.695414399663235]
This paper investigates the potential vulnerability of text-to-image (T2I) diffusion models to backdoor attacks via personalization. Our study focuses on a zero-day backdoor vulnerability prevalent in two families of personalization methods, epitomized by Textual Inversion and DreamBooth. By studying the prompt processing of Textual Inversion and DreamBooth, we have devised dedicated backdoor attacks according to the different ways of dealing with unseen tokens.
arXiv Detail & Related papers (2023-05-18T04:28:47Z)
Boosting Adversarial Transferability via Fusing Logits of Top-1 Decomposed Feature [36.78292952798531]
We propose a Singular Value Decomposition (SVD)-based feature-level attack method. Our approach is inspired by the discovery that eigenvectors associated with the larger singular values from the middle layer features exhibit superior generalization and attention properties.
arXiv Detail & Related papers (2023-05-02T12:27:44Z)
Gradient Shaping: Enhancing Backdoor Attack Against Reverse Engineering [39.11590429626592]
gradient-based trigger inversion is considered to be among the most effective backdoor detection techniques. Our study shows that existing attacks tend to inject the backdoor characterized by a low change rate around trigger-carrying inputs. We design a new attack enhancement called textitGradient Shaping (GRASP) to reduce the change rate of a backdoored model with regard to the trigger.
arXiv Detail & Related papers (2023-01-29T01:17:46Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
Deeper Insights into ViTs Robustness towards Common Corruptions [82.79764218627558]
We investigate how CNN-like architectural designs and CNN-based data augmentation strategies impact on ViTs' robustness towards common corruptions. We demonstrate that overlapping patch embedding and convolutional Feed-Forward Network (FFN) boost performance on robustness. We also introduce a novel conditional method enabling input-varied augmentations from two angles.
arXiv Detail & Related papers (2022-04-26T08:22:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.