TrojanEdit: Multimodal Backdoor Attack Against Image Editing Model
- URL: http://arxiv.org/abs/2411.14681v2
- Date: Fri, 30 May 2025 02:58:20 GMT
- Title: TrojanEdit: Multimodal Backdoor Attack Against Image Editing Model
- Authors: Ji Guo, Peihong Chen, Wenbo Jiang, Xiaolei Wen, Jiaming He, Jiachen Li, Guoming Lu, Aiguo Chen, Hongwei Li,
- Abstract summary: We present the first study of backdoor attacks on multimodal diffusion-based image editing models.<n>TrojanEdit is a backdoor injection framework that dynamically adjusts the gradient contributions of each modality during training.
- Score: 9.42648142497562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal diffusion models for image editing generate outputs conditioned on both textual instructions and visual inputs, aiming to modify target regions while preserving the rest of the image. Although diffusion models have been shown to be vulnerable to backdoor attacks, existing efforts mainly focus on unimodal generative models and fail to address the unique challenges in multimodal image editing. In this paper, we present the first study of backdoor attacks on multimodal diffusion-based image editing models. We investigate the use of both textual and visual triggers to embed a backdoor that achieves high attack success rates while maintaining the model's normal functionality. However, we identify a critical modality bias. Simply combining triggers from different modalities leads the model to primarily rely on the stronger one, often the visual modality, which results in a loss of multimodal behavior and degrades editing quality. To overcome this issue, we propose TrojanEdit, a backdoor injection framework that dynamically adjusts the gradient contributions of each modality during training. This allows the model to learn a truly multimodal backdoor that activates only when both triggers are present. Extensive experiments on multiple image editing models show that TrojanEdit successfully integrates triggers from different modalities, achieving balanced multimodal backdoor learning while preserving clean editing performance and ensuring high attack effectiveness.
Related papers
- Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model [87.23753533733046]
We introduce Muddit, a unified discrete diffusion transformer that enables fast and parallel generation across both text and image modalities.<n>Unlike prior unified diffusion models trained from scratch, Muddit integrates strong visual priors from a pretrained text-to-image backbone with a lightweight text decoder.
arXiv Detail & Related papers (2025-05-29T16:15:48Z) - Parasite: A Steganography-based Backdoor Attack Framework for Diffusion Models [9.459318290809907]
We propose a novel backdoor attack method called "Parasite" for image-to-image tasks in diffusion models.
"Parasite" as a novel attack method effectively bypasses existing detection frameworks to execute backdoor attacks.
arXiv Detail & Related papers (2025-04-08T08:53:47Z) - EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks.
The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm.
We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z) - How to Backdoor Consistency Models? [10.977907906989342]
We conduct the first study on the vulnerability of consistency models to backdoor attacks.
Our proposed framework demonstrates the vulnerability of consistency models to backdoor attacks.
Our framework successfully compromises the consistency models while maintaining high utility and specificity.
arXiv Detail & Related papers (2024-10-14T22:25:06Z) - BadCM: Invisible Backdoor Attack Against Cross-Modal Learning [110.37205323355695]
We introduce a novel bilateral backdoor to fill in the missing pieces of the puzzle in the cross-modal backdoor.
BadCM is the first invisible backdoor method deliberately designed for diverse cross-modal attacks within one unified framework.
arXiv Detail & Related papers (2024-10-03T03:51:53Z) - Stealth edits to large language models [76.53356051271014]
We show that a single metric can be used to assess a model's editability.
We also reveal the vulnerability of language models to stealth attacks.
arXiv Detail & Related papers (2024-06-18T14:43:18Z) - Stealthy Targeted Backdoor Attacks against Image Captioning [16.409633596670368]
We present a novel method to craft targeted backdoor attacks against image caption models.
Our method first learns a special trigger by leveraging universal perturbation techniques for object detection.
Our approach can achieve a high attack success rate while having a negligible impact on model clean performance.
arXiv Detail & Related papers (2024-06-09T18:11:06Z) - Invisible Backdoor Attacks on Diffusion Models [22.08671395877427]
Recent research has brought to light the vulnerability of diffusion models to backdoor attacks.
We present an innovative framework designed to acquire invisible triggers, enhancing the stealthiness and resilience of inserted backdoors.
arXiv Detail & Related papers (2024-06-02T17:43:19Z) - Backdoor Attack with Mode Mixture Latent Modification [26.720292228686446]
We propose a backdoor attack paradigm that only requires minimal alterations to a clean model in order to inject the backdoor under the guise of fine-tuning.
We evaluate the effectiveness of our method on four popular benchmark datasets.
arXiv Detail & Related papers (2024-03-12T09:59:34Z) - VL-Trojan: Multimodal Instruction Backdoor Attacks against
Autoregressive Visual Language Models [65.23688155159398]
Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context.
Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities.
Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images.
We propose a multimodal instruction backdoor attack, namely VL-Trojan.
arXiv Detail & Related papers (2024-02-21T14:54:30Z) - On the Multi-modal Vulnerability of Diffusion Models [56.08923332178462]
We propose MMP-Attack to manipulate the generation results of diffusion models by appending a specific suffix to the original prompt.<n>Our goal is to induce diffusion models to generate a specific object while simultaneously eliminating the original object.
arXiv Detail & Related papers (2024-02-02T12:39:49Z) - Object-oriented backdoor attack against image captioning [40.5688859498834]
Backdoor attack against image classification task has been widely studied and proven to be successful.
In this paper, we explore backdoor attack towards image captioning models by poisoning training data.
Our method proves the weakness of image captioning models to backdoor attack and we hope this work can raise the awareness of defending against backdoor attack in the image captioning field.
arXiv Detail & Related papers (2024-01-05T01:52:13Z) - Protect Federated Learning Against Backdoor Attacks via Data-Free
Trigger Generation [25.072791779134]
Federated Learning (FL) enables large-scale clients to collaboratively train a model without sharing their raw data.
Due to the lack of data auditing for untrusted clients, FL is vulnerable to poisoning attacks, especially backdoor attacks.
We propose a novel data-free trigger-generation-based defense approach based on the two characteristics of backdoor attacks.
arXiv Detail & Related papers (2023-08-22T10:16:12Z) - Text-to-Image Diffusion Models can be Easily Backdoored through
Multimodal Data Poisoning [29.945013694922924]
We propose BadT2I, a general multimodal backdoor attack framework that tampers with image synthesis in diverse semantic levels.
Specifically, we perform backdoor attacks on three levels of the vision semantics: Pixel-Backdoor, Object-Backdoor and Style-Backdoor.
By utilizing a regularization loss, our methods efficiently inject backdoors into a large-scale text-to-image diffusion model.
arXiv Detail & Related papers (2023-05-07T03:21:28Z) - Mask and Restore: Blind Backdoor Defense at Test Time with Masked
Autoencoder [57.739693628523]
We propose a framework for blind backdoor defense with Masked AutoEncoder (BDMAE)
BDMAE detects possible triggers in the token space using image structural similarity and label consistency between the test image and MAE restorations.
Our approach is blind to the model restorations, trigger patterns and image benignity.
arXiv Detail & Related papers (2023-03-27T19:23:33Z) - Benchmarking Robustness of Multimodal Image-Text Models under
Distribution Shift [50.64474103506595]
We investigate the robustness of 12 popular open-sourced image-text models under common perturbations on five tasks.
Character-level perturbations constitute the most severe distribution shift for text, and zoom blur is the most severe shift for image data.
arXiv Detail & Related papers (2022-12-15T18:52:03Z) - SINE: SINgle Image Editing with Text-to-Image Diffusion Models [10.67527134198167]
This work aims to address the problem of single-image editing.
We propose a novel model-based guidance built upon the classifier-free guidance.
We show promising editing capabilities, including changing style, content addition, and object manipulation.
arXiv Detail & Related papers (2022-12-08T18:57:13Z) - Composing Ensembles of Pre-trained Models via Iterative Consensus [95.10641301155232]
We propose a unified framework for composing ensembles of different pre-trained models.
We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization.
We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer.
arXiv Detail & Related papers (2022-10-20T18:46:31Z) - Frequency Domain Model Augmentation for Adversarial Attack [91.36850162147678]
For black-box attacks, the gap between the substitute model and the victim model is usually large.
We propose a novel spectrum simulation attack to craft more transferable adversarial examples against both normally trained and defense models.
arXiv Detail & Related papers (2022-07-12T08:26:21Z) - Dual-Key Multimodal Backdoors for Visual Question Answering [26.988750557552983]
We show that multimodal networks are vulnerable to a novel type of attack that we refer to as Dual-Key Multimodal Backdoors.
This attack exploits the complex fusion mechanisms used by state-of-the-art networks to embed backdoors that are both effective and stealthy.
We present an extensive study of multimodal backdoors on the Visual Question Answering (VQA) task with multiple architectures and visual feature backbones.
arXiv Detail & Related papers (2021-12-14T18:59:52Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Clean-Label Backdoor Attacks on Video Recognition Models [87.46539956587908]
We show that image backdoor attacks are far less effective on videos.
We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models.
Our proposed backdoor attack is resistant to state-of-the-art backdoor defense/detection methods.
arXiv Detail & Related papers (2020-03-06T04:51:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.