Related papers: Stealthy Targeted Backdoor Attacks against Image Captioning

Stealthy Targeted Backdoor Attacks against Image Captioning

URL: http://arxiv.org/abs/2406.05874v1
Date: Sun, 9 Jun 2024 18:11:06 GMT
Title: Stealthy Targeted Backdoor Attacks against Image Captioning
Authors: Wenshu Fan, Hongwei Li, Wenbo Jiang, Meng Hao, Shui Yu, Xiao Zhang,
Abstract summary: We present a novel method to craft targeted backdoor attacks against image caption models. Our method first learns a special trigger by leveraging universal perturbation techniques for object detection. Our approach can achieve a high attack success rate while having a negligible impact on model clean performance.
Score: 16.409633596670368
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, there has been an explosive growth in multimodal learning. Image captioning, a classical multimodal task, has demonstrated promising applications and attracted extensive research attention. However, recent studies have shown that image caption models are vulnerable to some security threats such as backdoor attacks. Existing backdoor attacks against image captioning typically pair a trigger either with a predefined sentence or a single word as the targeted output, yet they are unrelated to the image content, making them easily noticeable as anomalies by humans. In this paper, we present a novel method to craft targeted backdoor attacks against image caption models, which are designed to be stealthier than prior attacks. Specifically, our method first learns a special trigger by leveraging universal perturbation techniques for object detection, then places the learned trigger in the center of some specific source object and modifies the corresponding object name in the output caption to a predefined target name. During the prediction phase, the caption produced by the backdoored model for input images with the trigger can accurately convey the semantic information of the rest of the whole image, while incorrectly recognizing the source object as the predefined target. Extensive experiments demonstrate that our approach can achieve a high attack success rate while having a negligible impact on model clean performance. In addition, we show our method is stealthy in that the produced backdoor samples are indistinguishable from clean samples in both image and text domains, which can successfully bypass existing backdoor defenses, highlighting the need for better defensive mechanisms against such stealthy backdoor attacks.

Related papers

Parasite: A Steganography-based Backdoor Attack Framework for Diffusion Models [9.459318290809907]
We propose a novel backdoor attack method called "Parasite" for image-to-image tasks in diffusion models. "Parasite" as a novel attack method effectively bypasses existing detection frameworks to execute backdoor attacks.
arXiv Detail & Related papers (2025-04-08T08:53:47Z)
Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models [8.672029086609884]
Diffusion Models (DMs) are vulnerable to backdoor attacks. Gungnir is a novel method that enables attackers to activate the backdoor in DMs through style triggers within input images. Our technique generates trigger-embedded images that are perceptually indistinguishable from clean images.
arXiv Detail & Related papers (2025-02-28T02:08:26Z)
Proactive Adversarial Defense: Harnessing Prompt Tuning in Vision-Language Models to Detect Unseen Backdoored Images [0.0]
Backdoor attacks pose a critical threat by embedding hidden triggers into inputs, causing models to misclassify them into target labels. We introduce a groundbreaking method to detect unseen backdoored images during both training and inference. Our approach trains learnable text prompts to differentiate clean images from those with hidden backdoor triggers.
arXiv Detail & Related papers (2024-12-11T19:54:14Z)
An Invisible Backdoor Attack Based On Semantic Feature [0.0]
Backdoor attacks have severely threatened deep neural network (DNN) models in the past several years. We propose a novel backdoor attack, making imperceptible changes. We evaluate our attack on three prominent image classification datasets.
arXiv Detail & Related papers (2024-05-19T13:50:40Z)
Clean-image Backdoor Attacks [34.051173092777844]
We propose clean-image backdoor attacks which uncover that backdoors can still be injected via a fraction of incorrect labels. In our attacks, the attacker first seeks a trigger feature to divide the training images into two parts. The backdoor will be finally implanted into the target model after it is trained on the poisoned data.
arXiv Detail & Related papers (2024-03-22T07:47:13Z)
Backdoor Attack with Mode Mixture Latent Modification [26.720292228686446]
We propose a backdoor attack paradigm that only requires minimal alterations to a clean model in order to inject the backdoor under the guise of fine-tuning. We evaluate the effectiveness of our method on four popular benchmark datasets.
arXiv Detail & Related papers (2024-03-12T09:59:34Z)
Object-oriented backdoor attack against image captioning [40.5688859498834]
Backdoor attack against image classification task has been widely studied and proven to be successful. In this paper, we explore backdoor attack towards image captioning models by poisoning training data. Our method proves the weakness of image captioning models to backdoor attack and we hope this work can raise the awareness of defending against backdoor attack in the image captioning field.
arXiv Detail & Related papers (2024-01-05T01:52:13Z)
BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest. We demonstrate that this technology can be attacked to generate content that subtly manipulates its users. We propose a Backdoor Attack on text-to-image Generative Models (BAGM) Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z)
Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder [57.739693628523]
We propose a framework for blind backdoor defense with Masked AutoEncoder (BDMAE) BDMAE detects possible triggers in the token space using image structural similarity and label consistency between the test image and MAE restorations. Our approach is blind to the model restorations, trigger patterns and image benignity.
arXiv Detail & Related papers (2023-03-27T19:23:33Z)
Invisible Backdoor Attack with Dynamic Triggers against Person Re-identification [71.80885227961015]
Person Re-identification (ReID) has rapidly progressed with wide real-world applications, but also poses significant risks of adversarial attacks. We propose a novel backdoor attack on ReID under a new all-to-unknown scenario, called Dynamic Triggers Invisible Backdoor Attack (DT-IBA) We extensively validate the effectiveness and stealthiness of the proposed attack on benchmark datasets, and evaluate the effectiveness of several defense methods against our attack.
arXiv Detail & Related papers (2022-11-20T10:08:28Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
Backdoor Attacks on Self-Supervised Learning [22.24046752858929]
We show that self-supervised learning methods are vulnerable to backdoor attacks. An attacker poisons a part of the unlabeled data by adding a small trigger (known to the attacker) to the images. We propose a knowledge distillation based defense algorithm that succeeds in neutralizing the attack.
arXiv Detail & Related papers (2021-05-21T04:22:05Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
Rethinking the Trigger of Backdoor Attack [83.98031510668619]
Currently, most of existing backdoor attacks adopted the setting of emphstatic trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area. We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2020-04-09T17:19:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.