Backdooring Textual Inversion for Concept Censorship
- URL: http://arxiv.org/abs/2308.10718v2
- Date: Wed, 23 Aug 2023 13:56:52 GMT
- Title: Backdooring Textual Inversion for Concept Censorship
- Authors: Yutong Wu, Jie Zhang, Florian Kerschbaum, and Tianwei Zhang
- Abstract summary: This paper focuses on the personalization technique dubbed Textual Inversion (TI)
TI crafts the word embedding that contains detailed information about a specific object.
To achieve the concept censorship of a TI model, we propose injecting backdoors into the TI embeddings.
- Score: 34.84218971929207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed success in AIGC (AI Generated Content). People
can make use of a pre-trained diffusion model to generate images of high
quality or freely modify existing pictures with only prompts in nature
language. More excitingly, the emerging personalization techniques make it
feasible to create specific-desired images with only a few images as
references. However, this induces severe threats if such advanced techniques
are misused by malicious users, such as spreading fake news or defaming
individual reputations. Thus, it is necessary to regulate personalization
models (i.e., concept censorship) for their development and advancement.
In this paper, we focus on the personalization technique dubbed Textual
Inversion (TI), which is becoming prevailing for its lightweight nature and
excellent performance. TI crafts the word embedding that contains detailed
information about a specific object. Users can easily download the word
embedding from public websites like Civitai and add it to their own stable
diffusion model without fine-tuning for personalization. To achieve the concept
censorship of a TI model, we propose leveraging the backdoor technique for good
by injecting backdoors into the Textual Inversion embeddings. Briefly, we
select some sensitive words as triggers during the training of TI, which will
be censored for normal use. In the subsequent generation stage, if the triggers
are combined with personalized embeddings as final prompts, the model will
output a pre-defined target image rather than images including the desired
malicious concept.
To demonstrate the effectiveness of our approach, we conduct extensive
experiments on Stable Diffusion, a prevailing open-sourced text-to-image model.
Our code, data, and results are available at
https://concept-censorship.github.io.
Related papers
- Context-Aware Full Body Anonymization using Text-to-Image Diffusion Models [1.5088726951324294]
Anonymization plays a key role in protecting sensible information of individuals in real world datasets.
In this paper, we propose a workflow for full body person anonymization utilizing Stable Diffusion as a generative backend.
We show that our method outperforms state-of-the art anonymization pipelines with respect to image quality, resolution, Inception Score (IS) and Frechet Inception Distance (FID)
arXiv Detail & Related papers (2024-10-11T06:04:30Z) - Backdooring Bias into Text-to-Image Models [16.495996266157274]
We show that an adversary can add an arbitrary bias through a backdoor attack that would affect even benign users generating images.
Our attack remains stealthy as it preserves semantic information given in the text prompt.
We show how the current state-of-the-art generative models make this attack both cheap and feasible for any adversary.
arXiv Detail & Related papers (2024-06-21T14:53:19Z) - Stealthy Targeted Backdoor Attacks against Image Captioning [16.409633596670368]
We present a novel method to craft targeted backdoor attacks against image caption models.
Our method first learns a special trigger by leveraging universal perturbation techniques for object detection.
Our approach can achieve a high attack success rate while having a negligible impact on model clean performance.
arXiv Detail & Related papers (2024-06-09T18:11:06Z) - Catch You Everything Everywhere: Guarding Textual Inversion via Concept Watermarking [67.60174799881597]
We propose the novel concept watermarking, where watermark information is embedded into the target concept and then extracted from generated images based on the watermarked concept.
In practice, the concept owner can upload his concept with different watermarks (ie, serial numbers) to the platform, and the platform allocates different users with different serial numbers for subsequent tracing and forensics.
arXiv Detail & Related papers (2023-09-12T03:33:13Z) - BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest.
We demonstrate that this technology can be attacked to generate content that subtly manipulates its users.
We propose a Backdoor Attack on text-to-image Generative Models (BAGM)
Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z) - Ablating Concepts in Text-to-Image Diffusion Models [57.9371041022838]
Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability.
These models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos.
We propose an efficient method of ablating concepts in the pretrained model, preventing the generation of a target concept.
arXiv Detail & Related papers (2023-03-23T17:59:42Z) - Rickrolling the Artist: Injecting Backdoors into Text Encoders for
Text-to-Image Synthesis [16.421253324649555]
We introduce backdoor attacks against text-guided generative models.
Our attacks only slightly alter an encoder so that no suspicious model behavior is apparent for image generations with clean prompts.
arXiv Detail & Related papers (2022-11-04T12:36:36Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - An Image is Worth One Word: Personalizing Text-to-Image Generation using
Textual Inversion [60.05823240540769]
Text-to-image models offer unprecedented freedom to guide creation through natural language.
Here we present a simple approach that allows such creative freedom.
We find evidence that a single word embedding is sufficient for capturing unique and varied concepts.
arXiv Detail & Related papers (2022-08-02T17:50:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.