BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models
- URL: http://arxiv.org/abs/2307.16489v2
- Date: Tue, 5 Sep 2023 09:43:40 GMT
- Title: BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models
- Authors: Jordan Vice, Naveed Akhtar, Richard Hartley, Ajmal Mian
- Abstract summary: The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest.
We demonstrate that this technology can be attacked to generate content that subtly manipulates its users.
We propose a Backdoor Attack on text-to-image Generative Models (BAGM)
Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
- Score: 54.19289900203071
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise in popularity of text-to-image generative artificial intelligence
(AI) has attracted widespread public interest. We demonstrate that this
technology can be attacked to generate content that subtly manipulates its
users. We propose a Backdoor Attack on text-to-image Generative Models (BAGM),
which upon triggering, infuses the generated images with manipulative details
that are naturally blended in the content. Our attack is the first to target
three popular text-to-image generative models across three stages of the
generative process by modifying the behaviour of the embedded tokenizer, the
language model or the image generative model. Based on the penetration level,
BAGM takes the form of a suite of attacks that are referred to as surface,
shallow and deep attacks in this article. Given the existing gap within this
domain, we also contribute a comprehensive set of quantitative metrics designed
specifically for assessing the effectiveness of backdoor attacks on
text-to-image models. The efficacy of BAGM is established by attacking
state-of-the-art generative models, using a marketing scenario as the target
domain. To that end, we contribute a dataset of branded product images. Our
embedded backdoors increase the bias towards the target outputs by more than
five times the usual, without compromising the model robustness or the
generated content utility. By exposing generative AI's vulnerabilities, we
encourage researchers to tackle these challenges and practitioners to exercise
caution when using pre-trained models. Relevant code, input prompts and
supplementary material can be found at https://github.com/JJ-Vice/BAGM, and the
dataset is available at:
https://ieee-dataport.org/documents/marketable-foods-mf-dataset.
Keywords: Generative Artificial Intelligence, Generative Models,
Text-to-Image generation, Backdoor Attacks, Trojan, Stable Diffusion.
Related papers
- Stealthy Targeted Backdoor Attacks against Image Captioning [16.409633596670368]
We present a novel method to craft targeted backdoor attacks against image caption models.
Our method first learns a special trigger by leveraging universal perturbation techniques for object detection.
Our approach can achieve a high attack success rate while having a negligible impact on model clean performance.
arXiv Detail & Related papers (2024-06-09T18:11:06Z) - ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users [18.3621509910395]
We propose a novel Automatic Red-Teaming framework, ART, to evaluate the safety risks of text-to-image models.
With our comprehensive experiments, we reveal the toxicity of the popular open-source text-to-image models.
We also introduce three large-scale red-teaming datasets for studying the safety risks associated with text-to-image models.
arXiv Detail & Related papers (2024-05-24T07:44:27Z) - Manipulating and Mitigating Generative Model Biases without Retraining [49.60774626839712]
We propose a dynamic and computationally efficient manipulation of T2I model biases by exploiting their rich language embedding spaces without model retraining.
We show that leveraging foundational vector algebra allows for a convenient control over language model embeddings to shift T2I model outputs.
As a by-product, this control serves as a form of precise prompt engineering to generate images which are generally implausible using regular text prompts.
arXiv Detail & Related papers (2024-04-03T07:33:30Z) - Generated Distributions Are All You Need for Membership Inference
Attacks Against Generative Models [29.135008138824023]
We propose the first generalized membership inference attack against a variety of generative models.
Experiments validate that all the generative models are vulnerable to our attack.
arXiv Detail & Related papers (2023-10-30T10:21:26Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion
Models [69.20464255450788]
Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising.
Recent studies have shown that basic unconditional DMs are vulnerable to backdoor injection.
This paper presents a unified backdoor attack framework to expand the current scope of backdoor analysis for DMs.
arXiv Detail & Related papers (2023-06-12T05:14:13Z) - Rickrolling the Artist: Injecting Backdoors into Text Encoders for
Text-to-Image Synthesis [16.421253324649555]
We introduce backdoor attacks against text-guided generative models.
Our attacks only slightly alter an encoder so that no suspicious model behavior is apparent for image generations with clean prompts.
arXiv Detail & Related papers (2022-11-04T12:36:36Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Membership Inference Attacks Against Text-to-image Generation Models [23.39695974954703]
This paper performs the first privacy analysis of text-to-image generation models through the lens of membership inference.
We propose three key intuitions about membership information and design four attack methodologies accordingly.
All of the proposed attacks can achieve significant performance, in some cases even close to an accuracy of 1, and thus the corresponding risk is much more severe than that shown by existing membership inference attacks.
arXiv Detail & Related papers (2022-10-03T14:31:39Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.