Iterative Prompt Refinement for Safer Text-to-Image Generation
- URL: http://arxiv.org/abs/2509.13760v1
- Date: Wed, 17 Sep 2025 07:16:06 GMT
- Title: Iterative Prompt Refinement for Safer Text-to-Image Generation
- Authors: Jinwoo Jeon, JunHyeok Oh, Hayeong Lee, Byung-Jun Lee,
- Abstract summary: Existing safety methods typically refine prompts using large language models (LLMs)<n>We propose an iterative prompt refinement algorithm that uses Vision Language Models (VLMs) to analyze both the input prompts and the generated images.<n>Our approach produces safer outputs without compromising alignment with user intent, offering a practical solution for generating safer T2I content.
- Score: 4.174845397893041
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-Image (T2I) models have made remarkable progress in generating images from text prompts, but their output quality and safety still depend heavily on how prompts are phrased. Existing safety methods typically refine prompts using large language models (LLMs), but they overlook the images produced, which can result in unsafe outputs or unnecessary changes to already safe prompts. To address this, we propose an iterative prompt refinement algorithm that uses Vision Language Models (VLMs) to analyze both the input prompts and the generated images. By leveraging visual feedback, our method refines prompts more effectively, improving safety while maintaining user intent and reliability comparable to existing LLM-based approaches. Additionally, we introduce a new dataset labeled with both textual and visual safety signals using off-the-shelf multi-modal LLM, enabling supervised fine-tuning. Experimental results demonstrate that our approach produces safer outputs without compromising alignment with user intent, offering a practical solution for generating safer T2I content. Our code is available at https://github.com/ku-dmlab/IPR. \textbf{\textcolor{red}WARNING: This paper contains examples of harmful or inappropriate images generated by models.
Related papers
- SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models [74.11062256255387]
Text-to-image models are highly vulnerable to adversarial prompts, which can bypass safety measures and produce harmful content.<n>We introduce SafeGuider, a two-step framework designed for robust safety control without compromising generation quality.<n>SafeGuider demonstrates exceptional effectiveness in minimizing attack success rates, achieving a maximum rate of only 5.48% across various attack scenarios.
arXiv Detail & Related papers (2025-10-05T10:24:48Z) - Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models [73.43013217318965]
Multimodal Prompt Decoupling Attack (MPDA)<n>MPDA uses image modality to separate the harmful semantic components of the original unsafe prompt.<n>Visual language model generates image captions to ensure semantic consistency between the generated NSFW images and the original unsafe prompts.
arXiv Detail & Related papers (2025-09-21T11:22:32Z) - PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation [30.2092299298228]
Text-to-image (T2I) models are vulnerable to producing not-safe-for-work (NSFW) content, such as violent or explicit imagery.<n>We propose PromptSafe, a gated prompt tuning framework that combines a lightweight, text-only supervised soft embedding with an inference-time gated control network.<n>We show that PromptSafe achieves a SOTA unsafe generation rate (2.36%) while preserving high benign fidelity.
arXiv Detail & Related papers (2025-08-02T09:09:40Z) - SafeText: Safe Text-to-image Models via Aligning the Text Encoder [38.14026164194725]
Text-to-image models can generate harmful images when presented with unsafe prompts.<n>We propose SafeText, a novel alignment method that fine-tunes the text encoder rather than the diffusion module.<n>Our results show that SafeText effectively prevents harmful image generation with minor impact on the images for safe prompts.
arXiv Detail & Related papers (2025-02-28T01:02:57Z) - PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models [38.45239843869313]
Text-to-image (T2I) models have exhibited remarkable performance in generating high-quality images from text descriptions.<n>T2I models are vulnerable to misuse, particularly generating not-safe-for-work (NSFW) content.<n>We present PromptGuard, a novel content moderation technique that draws inspiration from the system prompt mechanism in large language models.
arXiv Detail & Related papers (2025-01-07T05:39:21Z) - MLLM-as-a-Judge for Image Safety without Human Labeling [81.24707039432292]
In the age of AI-generated content (AIGC), many image generation models are capable of producing harmful content.<n>It is crucial to identify such unsafe images based on established safety rules.<n>Existing approaches typically fine-tune MLLMs with human-labeled datasets.
arXiv Detail & Related papers (2024-12-31T00:06:04Z) - Latent Guard: a Safety Framework for Text-to-image Generation [64.49596711025993]
Existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification.
We propose Latent Guard, a framework designed to improve safety measures in text-to-image generation.
Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts.
arXiv Detail & Related papers (2024-04-11T17:59:52Z) - Dynamic Prompt Optimizing for Text-to-Image Generation [63.775458908172176]
We introduce the textbfPrompt textbfAuto-textbfEditing (PAE) method to improve text-to-image generative models.
We employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts.
arXiv Detail & Related papers (2024-04-05T13:44:39Z) - Universal Prompt Optimizer for Safe Text-to-Image Generation [27.32589928097192]
We propose the first universal prompt for safe T2I (POSI) generation in black-box scenario.<n>Our approach can effectively reduce the likelihood of various T2I models in generating inappropriate images.
arXiv Detail & Related papers (2024-02-16T18:36:36Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based
Text-to-Image Generation by Selection [53.320946030761796]
diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt.
We show that large T2I diffusion models are more faithful than usually assumed, and can generate images faithful to even complex prompts.
We introduce a pipeline that generates candidate images for a text prompt and picks the best one according to an automatic scoring system.
arXiv Detail & Related papers (2023-05-22T17:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.