FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models
- URL: http://arxiv.org/abs/2412.18302v1
- Date: Tue, 24 Dec 2024 09:11:37 GMT
- Title: FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models
- Authors: Jaechul Roh, Andrew Yuan, Jinsong Mao,
- Abstract summary: Text-to-Image (T2I) diffusion models have rapidly advanced, enabling the generation of high-quality images that align closely with descriptions.
Recent studies reveal that attackers can embed biases into these models through simple fine-tuning.
We introduce FameBias, a T2I biasing attack that manipulates the embeddings of input prompts to generate images featuring specific public figures.
- Score: 0.8192907805418583
- License:
- Abstract: Text-to-Image (T2I) diffusion models have rapidly advanced, enabling the generation of high-quality images that align closely with textual descriptions. However, this progress has also raised concerns about their misuse for propaganda and other malicious activities. Recent studies reveal that attackers can embed biases into these models through simple fine-tuning, causing them to generate targeted imagery when triggered by specific phrases. This underscores the potential for T2I models to act as tools for disseminating propaganda, producing images aligned with an attacker's objective for end-users. Building on this concept, we introduce FameBias, a T2I biasing attack that manipulates the embeddings of input prompts to generate images featuring specific public figures. Unlike prior methods, Famebias operates solely on the input embedding vectors without requiring additional model training. We evaluate FameBias comprehensively using Stable Diffusion V2, generating a large corpus of images based on various trigger nouns and target public figures. Our experiments demonstrate that FameBias achieves a high attack success rate while preserving the semantic context of the original prompts across multiple trigger-target pairs.
Related papers
- PromptLA: Towards Integrity Verification of Black-box Text-to-Image Diffusion Models [16.67563247104523]
Current text-to-image (T2I) diffusion models can produce high-quality images.
Malicious users who are authorized to use the model only for benign purposes might modify their models to generate images that result in harmful social impacts.
We propose a novel prompt selection algorithm for efficient and accurate integrity verification of T2I diffusion models.
arXiv Detail & Related papers (2024-12-20T07:24:32Z) - SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models [77.80595722480074]
SleeperMark is a novel framework designed to embed resilient watermarks into T2I diffusion models.
It guides the model to disentangle the watermark information from the semantic concepts it learns, allowing the model to retain the embedded watermark.
Our experiments demonstrate the effectiveness of SleeperMark across various types of diffusion models.
arXiv Detail & Related papers (2024-12-06T08:44:18Z) - Sparse Repellency for Shielded Generation in Text-to-image Diffusion Models [29.083402085790016]
We propose a method that coaxes the sampled trajectories of pretrained diffusion models to land on images that fall outside of a reference set.
We achieve this by adding repellency terms to the diffusion SDE throughout the generation trajectory.
We show that adding SPELL to popular diffusion models improves their diversity while impacting their FID only marginally, and performs comparatively better than other recent training-free diversity methods.
arXiv Detail & Related papers (2024-10-08T13:26:32Z) - GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models [75.04426753720553]
We propose a framework to identify, quantify, and explain biases in an open set setting.
This pipeline leverages a Large Language Model (LLM) to propose biases starting from a set of captions.
We show two variations of this framework: OpenBias and GradBias.
arXiv Detail & Related papers (2024-08-29T16:51:07Z) - Backdooring Bias into Text-to-Image Models [16.495996266157274]
We show that an adversary can add an arbitrary bias through a backdoor attack that would affect even benign users generating images.
Our attack remains stealthy as it preserves semantic information given in the text prompt.
We show how the current state-of-the-art generative models make this attack both cheap and feasible for any adversary.
arXiv Detail & Related papers (2024-06-21T14:53:19Z) - Manipulating and Mitigating Generative Model Biases without Retraining [49.60774626839712]
We propose a dynamic and computationally efficient manipulation of T2I model biases by exploiting their rich language embedding spaces without model retraining.
We show that leveraging foundational vector algebra allows for a convenient control over language model embeddings to shift T2I model outputs.
As a by-product, this control serves as a form of precise prompt engineering to generate images which are generally implausible using regular text prompts.
arXiv Detail & Related papers (2024-04-03T07:33:30Z) - Regeneration Based Training-free Attribution of Fake Images Generated by
Text-to-Image Generative Models [39.33821502730661]
We present a training-free method to attribute fake images generated by text-to-image models to their source models.
By calculating and ranking the similarity of the test image and the candidate images, we can determine the source of the image.
arXiv Detail & Related papers (2024-03-03T11:55:49Z) - Text-image guided Diffusion Model for generating Deepfake celebrity
interactions [50.37578424163951]
Diffusion models have recently demonstrated highly realistic visual content generation.
This paper devises and explores a novel method in that regard.
Our results show that with the devised scheme, it is possible to create fake visual content with alarming realism.
arXiv Detail & Related papers (2023-09-26T08:24:37Z) - DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models [79.71665540122498]
We propose a method for detecting unauthorized data usage by planting the injected content into the protected dataset.
Specifically, we modify the protected images by adding unique contents on these images using stealthy image warping functions.
By analyzing whether the model has memorized the injected content, we can detect models that had illegally utilized the unauthorized data.
arXiv Detail & Related papers (2023-07-06T16:27:39Z) - If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based
Text-to-Image Generation by Selection [53.320946030761796]
diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt.
We show that large T2I diffusion models are more faithful than usually assumed, and can generate images faithful to even complex prompts.
We introduce a pipeline that generates candidate images for a text prompt and picks the best one according to an automatic scoring system.
arXiv Detail & Related papers (2023-05-22T17:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.