FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models
- URL: http://arxiv.org/abs/2412.18302v1
- Date: Tue, 24 Dec 2024 09:11:37 GMT
- Title: FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models
- Authors: Jaechul Roh, Andrew Yuan, Jinsong Mao,
- Abstract summary: Text-to-Image (T2I) diffusion models have rapidly advanced, enabling the generation of high-quality images that align closely with descriptions.<n>Recent studies reveal that attackers can embed biases into these models through simple fine-tuning.<n>We introduce FameBias, a T2I biasing attack that manipulates the embeddings of input prompts to generate images featuring specific public figures.
- Score: 0.8192907805418583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-Image (T2I) diffusion models have rapidly advanced, enabling the generation of high-quality images that align closely with textual descriptions. However, this progress has also raised concerns about their misuse for propaganda and other malicious activities. Recent studies reveal that attackers can embed biases into these models through simple fine-tuning, causing them to generate targeted imagery when triggered by specific phrases. This underscores the potential for T2I models to act as tools for disseminating propaganda, producing images aligned with an attacker's objective for end-users. Building on this concept, we introduce FameBias, a T2I biasing attack that manipulates the embeddings of input prompts to generate images featuring specific public figures. Unlike prior methods, Famebias operates solely on the input embedding vectors without requiring additional model training. We evaluate FameBias comprehensively using Stable Diffusion V2, generating a large corpus of images based on various trigger nouns and target public figures. Our experiments demonstrate that FameBias achieves a high attack success rate while preserving the semantic context of the original prompts across multiple trigger-target pairs.
Related papers
- Implicit Bias Injection Attacks against Text-to-Image Diffusion Models [17.131167390657243]
biased T2I models can generate content with specific tendencies, potentially influencing people's perceptions.
This paper introduces a novel form of implicit bias that lacks explicit visual features but can manifest in diverse ways.
We propose an implicit bias injection attack framework (IBI-Attacks) against T2I diffusion models.
arXiv Detail & Related papers (2025-04-02T15:24:12Z) - SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models [77.80595722480074]
SleeperMark is a novel framework designed to embed resilient watermarks into T2I diffusion models.<n>It guides the model to disentangle the watermark information from the semantic concepts it learns, allowing the model to retain the embedded watermark.<n>Our experiments demonstrate the effectiveness of SleeperMark across various types of diffusion models.
arXiv Detail & Related papers (2024-12-06T08:44:18Z) - Sparse Repellency for Shielded Generation in Text-to-image Diffusion Models [29.083402085790016]
We propose a method that coaxes the sampled trajectories of pretrained diffusion models to land on images that fall outside of a reference set.
We achieve this by adding repellency terms to the diffusion SDE throughout the generation trajectory.
We show that adding SPELL to popular diffusion models improves their diversity while impacting their FID only marginally, and performs comparatively better than other recent training-free diversity methods.
arXiv Detail & Related papers (2024-10-08T13:26:32Z) - GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models [75.04426753720553]
We propose a framework to identify, quantify, and explain biases in an open set setting.
This pipeline leverages a Large Language Model (LLM) to propose biases starting from a set of captions.
We show two variations of this framework: OpenBias and GradBias.
arXiv Detail & Related papers (2024-08-29T16:51:07Z) - HTS-Attack: Heuristic Token Search for Jailbreaking Text-to-Image Models [28.28898114141277]
Text-to-Image(T2I) models have achieved remarkable success in image generation and editing.<n>These models still have many potential issues, particularly in generating inappropriate or Not-Safe-For-Work(NSFW) content.<n>We propose HTS-Attack, a token search attack method.
arXiv Detail & Related papers (2024-08-25T17:33:40Z) - Backdooring Bias into Text-to-Image Models [16.495996266157274]
We show that an adversary can add an arbitrary bias through a backdoor attack that would affect even benign users generating images.
Our attack remains stealthy as it preserves semantic information given in the text prompt.
We show how the current state-of-the-art generative models make this attack both cheap and feasible for any adversary.
arXiv Detail & Related papers (2024-06-21T14:53:19Z) - Manipulating and Mitigating Generative Model Biases without Retraining [49.60774626839712]
We propose a dynamic and computationally efficient manipulation of T2I model biases by exploiting their rich language embedding spaces without model retraining.
We show that leveraging foundational vector algebra allows for a convenient control over language model embeddings to shift T2I model outputs.
As a by-product, this control serves as a form of precise prompt engineering to generate images which are generally implausible using regular text prompts.
arXiv Detail & Related papers (2024-04-03T07:33:30Z) - Regeneration Based Training-free Attribution of Fake Images Generated by
Text-to-Image Generative Models [39.33821502730661]
We present a training-free method to attribute fake images generated by text-to-image models to their source models.
By calculating and ranking the similarity of the test image and the candidate images, we can determine the source of the image.
arXiv Detail & Related papers (2024-03-03T11:55:49Z) - Direct Consistency Optimization for Robust Customization of Text-to-Image Diffusion Models [67.68871360210208]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, can generate visuals with a high degree of consistency.<n>We propose a novel fine-tuning objective, dubbed Direct Consistency Optimization, which controls the deviation between fine-tuning and pretrained models.<n>We show that our approach achieves better prompt fidelity and subject fidelity than those post-optimized for merging regular fine-tuned models.
arXiv Detail & Related papers (2024-02-19T09:52:41Z) - Text-image guided Diffusion Model for generating Deepfake celebrity
interactions [50.37578424163951]
Diffusion models have recently demonstrated highly realistic visual content generation.
This paper devises and explores a novel method in that regard.
Our results show that with the devised scheme, it is possible to create fake visual content with alarming realism.
arXiv Detail & Related papers (2023-09-26T08:24:37Z) - DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models [79.71665540122498]
We propose a method for detecting unauthorized data usage by planting the injected content into the protected dataset.
Specifically, we modify the protected images by adding unique contents on these images using stealthy image warping functions.
By analyzing whether the model has memorized the injected content, we can detect models that had illegally utilized the unauthorized data.
arXiv Detail & Related papers (2023-07-06T16:27:39Z) - If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based
Text-to-Image Generation by Selection [53.320946030761796]
diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt.
We show that large T2I diffusion models are more faithful than usually assumed, and can generate images faithful to even complex prompts.
We introduce a pipeline that generates candidate images for a text prompt and picks the best one according to an automatic scoring system.
arXiv Detail & Related papers (2023-05-22T17:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.