Backdooring Bias into Text-to-Image Models
- URL: http://arxiv.org/abs/2406.15213v2
- Date: Thu, 10 Oct 2024 21:21:22 GMT
- Title: Backdooring Bias into Text-to-Image Models
- Authors: Ali Naseh, Jaechul Roh, Eugene Bagdasaryan, Amir Houmansadr,
- Abstract summary: We show that an adversary can add an arbitrary bias through a backdoor attack that would affect even benign users generating images.
Our attack remains stealthy as it preserves semantic information given in the text prompt.
We show how the current state-of-the-art generative models make this attack both cheap and feasible for any adversary.
- Score: 16.495996266157274
- License:
- Abstract: Text-conditional diffusion models, i.e. text-to-image, produce eye-catching images that represent descriptions given by a user. These images often depict benign concepts but could also carry other purposes. Specifically, visual information is easy to comprehend and could be weaponized for propaganda -- a serious challenge given widespread usage and deployment of generative models. In this paper, we show that an adversary can add an arbitrary bias through a backdoor attack that would affect even benign users generating images. While a user could inspect a generated image to comply with the given text description, our attack remains stealthy as it preserves semantic information given in the text prompt. Instead, a compromised model modifies other unspecified features of the image to add desired biases (that increase by 4-8x). Furthermore, we show how the current state-of-the-art generative models make this attack both cheap and feasible for any adversary, with costs ranging between $12-$18. We evaluate our attack over various types of triggers, adversary objectives, and biases and discuss mitigations and future work. Our code is available at https://github.com/jrohsc/Backdororing_Bias.
Related papers
- Natural Language Induced Adversarial Images [14.415478695871604]
We propose a natural language induced adversarial image attack method.
The core idea is to leverage a text-to-image model to generate adversarial images given input prompts.
In experiments, we found that some high-frequency semantic information such as "foggy", "humid", "stretching" can easily cause errors.
arXiv Detail & Related papers (2024-10-11T08:36:07Z) - Improving face generation quality and prompt following with synthetic captions [57.47448046728439]
We introduce a training-free pipeline designed to generate accurate appearance descriptions from images of people.
We then use these synthetic captions to fine-tune a text-to-image diffusion model.
Our results demonstrate that this approach significantly improves the model's ability to generate high-quality, realistic human faces.
arXiv Detail & Related papers (2024-05-17T15:50:53Z) - Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models [58.065255696601604]
We use compositional property of diffusion models, which allows to leverage multiple prompts in a single image generation.
We argue that it is essential to consider all possible approaches to image generation with diffusion models that can be employed by an adversary.
arXiv Detail & Related papers (2024-04-21T16:35:16Z) - Object-oriented backdoor attack against image captioning [40.5688859498834]
Backdoor attack against image classification task has been widely studied and proven to be successful.
In this paper, we explore backdoor attack towards image captioning models by poisoning training data.
Our method proves the weakness of image captioning models to backdoor attack and we hope this work can raise the awareness of defending against backdoor attack in the image captioning field.
arXiv Detail & Related papers (2024-01-05T01:52:13Z) - Backdooring Textual Inversion for Concept Censorship [34.84218971929207]
This paper focuses on the personalization technique dubbed Textual Inversion (TI)
TI crafts the word embedding that contains detailed information about a specific object.
To achieve the concept censorship of a TI model, we propose injecting backdoors into the TI embeddings.
arXiv Detail & Related papers (2023-08-21T13:39:04Z) - BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest.
We demonstrate that this technology can be attacked to generate content that subtly manipulates its users.
We propose a Backdoor Attack on text-to-image Generative Models (BAGM)
Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z) - DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models [79.71665540122498]
We propose a method for detecting unauthorized data usage by planting the injected content into the protected dataset.
Specifically, we modify the protected images by adding unique contents on these images using stealthy image warping functions.
By analyzing whether the model has memorized the injected content, we can detect models that had illegally utilized the unauthorized data.
arXiv Detail & Related papers (2023-07-06T16:27:39Z) - Rickrolling the Artist: Injecting Backdoors into Text Encoders for
Text-to-Image Synthesis [16.421253324649555]
We introduce backdoor attacks against text-guided generative models.
Our attacks only slightly alter an encoder so that no suspicious model behavior is apparent for image generations with clean prompts.
arXiv Detail & Related papers (2022-11-04T12:36:36Z) - Semantic Host-free Trojan Attack [54.25471812198403]
We propose a novel host-free Trojan attack with triggers that are fixed in the semantic space but not necessarily in the pixel space.
In contrast to existing Trojan attacks which use clean input images as hosts to carry small, meaningless trigger patterns, our attack considers triggers as full-sized images belonging to a semantically meaningful object class.
arXiv Detail & Related papers (2021-10-26T05:01:22Z) - Attack to Fool and Explain Deep Networks [59.97135687719244]
We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations.
Our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
arXiv Detail & Related papers (2021-06-20T03:07:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.