Rickrolling the Artist: Injecting Backdoors into Text Encoders for
Text-to-Image Synthesis
- URL: http://arxiv.org/abs/2211.02408v3
- Date: Wed, 9 Aug 2023 07:29:57 GMT
- Title: Rickrolling the Artist: Injecting Backdoors into Text Encoders for
Text-to-Image Synthesis
- Authors: Lukas Struppek, Dominik Hintersdorf, Kristian Kersting
- Abstract summary: We introduce backdoor attacks against text-guided generative models.
Our attacks only slightly alter an encoder so that no suspicious model behavior is apparent for image generations with clean prompts.
- Score: 16.421253324649555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While text-to-image synthesis currently enjoys great popularity among
researchers and the general public, the security of these models has been
neglected so far. Many text-guided image generation models rely on pre-trained
text encoders from external sources, and their users trust that the retrieved
models will behave as promised. Unfortunately, this might not be the case. We
introduce backdoor attacks against text-guided generative models and
demonstrate that their text encoders pose a major tampering risk. Our attacks
only slightly alter an encoder so that no suspicious model behavior is apparent
for image generations with clean prompts. By then inserting a single character
trigger into the prompt, e.g., a non-Latin character or emoji, the adversary
can trigger the model to either generate images with pre-defined attributes or
images following a hidden, potentially malicious description. We empirically
demonstrate the high effectiveness of our attacks on Stable Diffusion and
highlight that the injection process of a single backdoor takes less than two
minutes. Besides phrasing our approach solely as an attack, it can also force
an encoder to forget phrases related to certain concepts, such as nudity or
violence, and help to make image generation safer.
Related papers
- Backdooring Bias into Text-to-Image Models [16.495996266157274]
We show that an adversary can add an arbitrary bias through a backdoor attack that would affect even benign users generating images.
Our attack remains stealthy as it preserves semantic information given in the text prompt.
We show how the current state-of-the-art generative models make this attack both cheap and feasible for any adversary.
arXiv Detail & Related papers (2024-06-21T14:53:19Z) - Stealthy Targeted Backdoor Attacks against Image Captioning [16.409633596670368]
We present a novel method to craft targeted backdoor attacks against image caption models.
Our method first learns a special trigger by leveraging universal perturbation techniques for object detection.
Our approach can achieve a high attack success rate while having a negligible impact on model clean performance.
arXiv Detail & Related papers (2024-06-09T18:11:06Z) - Backdooring Textual Inversion for Concept Censorship [34.84218971929207]
This paper focuses on the personalization technique dubbed Textual Inversion (TI)
TI crafts the word embedding that contains detailed information about a specific object.
To achieve the concept censorship of a TI model, we propose injecting backdoors into the TI embeddings.
arXiv Detail & Related papers (2023-08-21T13:39:04Z) - FLIRT: Feedback Loop In-context Red Teaming [71.38594755628581]
We propose an automatic red teaming framework that evaluates a given model and exposes its vulnerabilities.
Our framework uses in-context learning in a feedback loop to red team models and trigger them into unsafe content generation.
arXiv Detail & Related papers (2023-08-08T14:03:08Z) - BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest.
We demonstrate that this technology can be attacked to generate content that subtly manipulates its users.
We propose a Backdoor Attack on text-to-image Generative Models (BAGM)
Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z) - I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models [0.0]
We present a gray-box adversarial attack on image-to-text, both untargeted and targeted.
Our attack operates in a gray-box manner, requiring no knowledge about the decoder module.
We also show that our attacks fool the popular open-source platform Hugging Face.
arXiv Detail & Related papers (2023-06-13T07:35:28Z) - Securing Deep Generative Models with Universal Adversarial Signature [69.51685424016055]
Deep generative models pose threats to society due to their potential misuse.
In this paper, we propose to inject a universal adversarial signature into an arbitrary pre-trained generative model.
The proposed method is validated on the FFHQ and ImageNet datasets with various state-of-the-art generative models.
arXiv Detail & Related papers (2023-05-25T17:59:01Z) - Mask and Restore: Blind Backdoor Defense at Test Time with Masked
Autoencoder [57.739693628523]
We propose a framework for blind backdoor defense with Masked AutoEncoder (BDMAE)
BDMAE detects possible triggers in the token space using image structural similarity and label consistency between the test image and MAE restorations.
Our approach is blind to the model restorations, trigger patterns and image benignity.
arXiv Detail & Related papers (2023-03-27T19:23:33Z) - Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image
Diffusion Models [103.61066310897928]
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt.
While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt.
We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt.
We introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness
arXiv Detail & Related papers (2023-01-31T18:10:38Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.