Related papers: Universal Prompt Optimizer for Safe Text-to-Image Generation

Related papers

SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models [74.11062256255387]
Text-to-image models are highly vulnerable to adversarial prompts, which can bypass safety measures and produce harmful content.<n>We introduce SafeGuider, a two-step framework designed for robust safety control without compromising generation quality.<n>SafeGuider demonstrates exceptional effectiveness in minimizing attack success rates, achieving a maximum rate of only 5.48% across various attack scenarios.
arXiv Detail & Related papers (2025-10-05T10:24:48Z)
Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs [36.42060582800515]
We introduce Text Preference Optimization (TPO), a framework that enables "free-lunch" alignment of T2I models.<n>TPO works by training the model to prefer matched prompts over mismatched prompts.<n>Our framework is general and compatible with existing preference-based algorithms.
arXiv Detail & Related papers (2025-09-30T04:32:34Z)
Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models [73.43013217318965]
Multimodal Prompt Decoupling Attack (MPDA)<n>MPDA uses image modality to separate the harmful semantic components of the original unsafe prompt.<n>Visual language model generates image captions to ensure semantic consistency between the generated NSFW images and the original unsafe prompts.
arXiv Detail & Related papers (2025-09-21T11:22:32Z)
Iterative Prompt Refinement for Safer Text-to-Image Generation [4.174845397893041]
Existing safety methods typically refine prompts using large language models (LLMs)<n>We propose an iterative prompt refinement algorithm that uses Vision Language Models (VLMs) to analyze both the input prompts and the generated images.<n>Our approach produces safer outputs without compromising alignment with user intent, offering a practical solution for generating safer T2I content.
arXiv Detail & Related papers (2025-09-17T07:16:06Z)
GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization [19.44247617251449]
We introduce GhostPrompt, the first automated jailbreak framework that combines dynamic prompt optimization with multimodal feedback.<n>It achieves state-of-the-art performance, increasing the ShieldLM-7B bypass rate from 12.5% to 99.0%.<n>It generalizes to unseen filters including GPT-4.1 and successfully jailbreaks DALLE 3 to generate NSFW images.
arXiv Detail & Related papers (2025-05-25T05:13:06Z)
Replace in Translation: Boost Concept Alignment in Counterfactual Text-to-Image [53.09546752700792]
We propose a strategy to instruct this replacing process, which is called as Explicit Logical Narrative Prompt (ELNP)<n>We design a metric to calculate how many required concepts in the prompt can be covered averagely in the synthesized images.<n>The extensive experiments and qualitative comparisons demonstrate that our strategy can boost the concept alignment in counterfactual T2I.
arXiv Detail & Related papers (2025-05-20T13:27:52Z)
Aligning Text to Image in Diffusion Models is Easier Than You Think [47.623236425067326]
We introduce a lightweight contrastive fine tuning strategy called SoftREPA that uses soft text tokens. Our method explicitly increases the mutual information between text and image representations, leading to enhanced semantic consistency.
arXiv Detail & Related papers (2025-03-11T10:14:22Z)
PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models [38.45239843869313]
Text-to-image (T2I) models have exhibited remarkable performance in generating high-quality images from text descriptions.<n>T2I models are vulnerable to misuse, particularly generating not-safe-for-work (NSFW) content.<n>We present PromptGuard, a novel content moderation technique that draws inspiration from the system prompt mechanism in large language models.
arXiv Detail & Related papers (2025-01-07T05:39:21Z)
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation [68.07258248467309]
Text-to-image (T2I) models have become widespread, but their limited safety guardrails expose end users to harmful content and potentially allow for model misuse. Current safety measures are typically limited to text-based filtering or concept removal strategies, able to remove just a few concepts from the model's generative capabilities. We introduce SafetyDPO, a method for safety alignment of T2I models through Direct Preference Optimization (DPO) We train safety experts, in the form of low-rank adaptation (LoRA) matrices, able to guide the generation process away from specific safety-related
arXiv Detail & Related papers (2024-12-13T18:59:52Z)
Safeguarding Text-to-Image Generation via Inference-Time Prompt-Noise Optimization [29.378296359782585]
Text-to-Image (T2I) diffusion models are widely recognized for their ability to generate high-quality and diverse images based on text prompts. Current efforts to prevent inappropriate image generation for T2I models are easy to bypass and vulnerable to adversarial attacks. We propose a novel, training-free approach, called Prompt-Noise Optimization (PNO), to mitigate unsafe image generation.
arXiv Detail & Related papers (2024-12-05T05:12:30Z)
RT-Attack: Jailbreaking Text-to-Image Models via Random Token [24.61198605177661]
We introduce a two-stage query-based black-box attack method utilizing random search. In the first stage, we establish a preliminary prompt by maximizing the semantic similarity between the adversarial and target harmful prompts. In the second stage, we use this initial prompt to refine our approach, creating a detailed adversarial prompt aimed at jailbreaking.
arXiv Detail & Related papers (2024-08-25T17:33:40Z)
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models [29.866192834825572]
Unlearning techniques have been developed to remove the model's ability to generate potentially harmful content. These methods are easily bypassed by adversarial attacks, making them unreliable for ensuring the safety of generated images. We propose Direct Unlearning Optimization (DUO), a novel framework for removing Not Safe For Work (NSFW) content from T2I models.
arXiv Detail & Related papers (2024-07-17T08:19:11Z)
Latent Guard: a Safety Framework for Text-to-image Generation [64.49596711025993]
Existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification. We propose Latent Guard, a framework designed to improve safety measures in text-to-image generation. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts.
arXiv Detail & Related papers (2024-04-11T17:59:52Z)
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models [10.70975463369742]
We present the Jailbreaking Prompt Attack (JPA) JPA searches for the target malicious concepts in the text embedding space using a group of antonyms. A prefix prompt is optimized in the discrete vocabulary space to align malicious concepts semantically in the text embedding space.
arXiv Detail & Related papers (2024-04-02T09:49:35Z)
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation [149.96612254604986]
PRISM is an algorithm that automatically produces human-interpretable and transferable prompts. Inspired by large language model (LLM) jailbreaking, PRISM leverages the in-context learning ability of LLMs to iteratively refine the candidate prompt distribution. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, styles, and images across multiple T2I models.
arXiv Detail & Related papers (2024-03-28T02:35:53Z)
GuardT2I: Defending Text-to-Image Models from Adversarial Prompts [16.317849859000074]
GuardT2I is a novel moderation framework that adopts a generative approach to enhance T2I models' robustness against adversarial prompts. Our experiments reveal that GuardT2I outperforms leading commercial solutions like OpenAI-Moderation and Microsoft Azure Moderator.
arXiv Detail & Related papers (2024-03-03T09:04:34Z)
Direct Consistency Optimization for Compositional Text-to-Image Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency. We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z)
Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models [86.92711729969488]
We analyze how to manipulate the text embeddings and remove unwanted content from them. The first regularizes the text embedding matrix and effectively suppresses the undesired content. The second method aims to further suppress the unwanted content generation of the prompt, and encourages the generation of desired content.
arXiv Detail & Related papers (2024-02-08T03:15:06Z)
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection [53.320946030761796]
diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt. We show that large T2I diffusion models are more faithful than usually assumed, and can generate images faithful to even complex prompts. We introduce a pipeline that generates candidate images for a text prompt and picks the best one according to an automatic scoring system.
arXiv Detail & Related papers (2023-05-22T17:59:41Z)
Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [101.97397967958722]
We propose a novel unified framework of Cycle-consistent Inverse GAN for both text-to-image generation and text-guided image manipulation tasks. We learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image. In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes.
arXiv Detail & Related papers (2021-08-03T08:38:16Z)
Towards Open-World Text-Guided Face Image Generation and Manipulation [52.83401421019309]
We propose a unified framework for both face image generation and manipulation. Our method supports open-world scenarios, including both image and text, without any re-training, fine-tuning, or post-processing.
arXiv Detail & Related papers (2021-04-18T16:56:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.