Related papers: Position: Towards Implicit Prompt For Text-To-Image Models

Position: Towards Implicit Prompt For Text-To-Image Models

URL: http://arxiv.org/abs/2403.02118v4
Date: Tue, 28 May 2024 04:24:14 GMT
Title: Position: Towards Implicit Prompt For Text-To-Image Models
Authors: Yue Yang, Yuqi Lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang, Ping Luo,
Abstract summary: This paper highlights the current state of text-to-image (T2I) models toward implicit prompts. We present a benchmark named ImplicitBench and conduct an investigation on the performance and impacts of implicit prompts. Experiment results show that T2I models are able to accurately create various target symbols indicated by implicit prompts.
Score: 57.00716011456852
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent text-to-image (T2I) models have had great success, and many benchmarks have been proposed to evaluate their performance and safety. However, they only consider explicit prompts while neglecting implicit prompts (hint at a target without explicitly mentioning it). These prompts may get rid of safety constraints and pose potential threats to the applications of these models. This position paper highlights the current state of T2I models toward implicit prompts. We present a benchmark named ImplicitBench and conduct an investigation on the performance and impacts of implicit prompts with popular T2I models. Specifically, we design and collect more than 2,000 implicit prompts of three aspects: General Symbols, Celebrity Privacy, and Not-Safe-For-Work (NSFW) Issues, and evaluate six well-known T2I models' capabilities under these implicit prompts. Experiment results show that (1) T2I models are able to accurately create various target symbols indicated by implicit prompts; (2) Implicit prompts bring potential risks of privacy leakage for T2I models. (3) Constraints of NSFW in most of the evaluated T2I models can be bypassed with implicit prompts. We call for increased attention to the potential and risks of implicit prompts in the T2I community and further investigation into the capabilities and impacts of implicit prompts, advocating for a balanced approach that harnesses their benefits while mitigating their risks.

Related papers

T2I-RiskyPrompt: A Benchmark for Safety Evaluation, Attack, and Defense on Text-to-Image Model [41.31194907935869]
We introduce T2I-RiskyPrompt, a benchmark for evaluating safety-related tasks in T2I models.<n>We first develop a hierarchical risk taxonomy, which consists of 6 primary categories and 14 fine-grained subcategories.<n>We construct a pipeline to collect and annotate risky prompts, where each prompt is annotated with both hierarchical category labels and detailed risk reasons.<n>Based on T2I-RiskyPrompt, we conduct a comprehensive evaluation of eight T2I models, nine defense methods, five safety filters, and five attack strategies.
arXiv Detail & Related papers (2025-10-25T14:00:26Z)
Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models [73.43013217318965]
Multimodal Prompt Decoupling Attack (MPDA)<n>MPDA uses image modality to separate the harmful semantic components of the original unsafe prompt.<n>Visual language model generates image captions to ensure semantic consistency between the generated NSFW images and the original unsafe prompts.
arXiv Detail & Related papers (2025-09-21T11:22:32Z)
AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models [58.85362281293525]
We introduce AcT2I, a benchmark designed to evaluate the performance of T2I models in generating images from action-centric prompts.<n>We experimentally validate that leading T2I models do not fare well on AcT2I.<n>We build upon this by developing a training-free, knowledge distillation technique utilizing Large Language Models to address this limitation.
arXiv Detail & Related papers (2025-09-19T16:41:39Z)
NSFW-Classifier Guided Prompt Sanitization for Safe Text-to-Image Generation [47.03824997129498]
"jailbreak" attacks in large language models bypass restrictions through subtle prompt modifications.<n>PromptSan is a novel approach to detoxify harmful prompts without altering model architecture.<n>PromptSan achieves state-of-the-art performance in reducing harmful content generation across multiple metrics.
arXiv Detail & Related papers (2025-06-23T06:17:30Z)
GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models [65.91565607573786]
Text-to-image (T2I) models can be misused to generate harmful content, including nudity or violence.<n>Recent research on red-teaming and adversarial attacks against T2I models has notable limitations.<n>We propose GenBreak, a framework that fine-tunes a red-team large language model (LLM) to systematically explore underlying vulnerabilities.
arXiv Detail & Related papers (2025-06-11T09:09:12Z)
OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models [73.6716695218951]
Over-refusal is a phenomenon known as $textitover-refusal$ that reduces the practical utility of T2I models.<n>We present OVERT ($textbfOVE$r-$textbfR$efusal evaluation on $textbfT$ext-to-image models), the first large-scale benchmark for assessing over-refusal behaviors.
arXiv Detail & Related papers (2025-05-27T15:42:46Z)
TokenProber: Jailbreaking Text-to-image Models via Fine-grained Word Impact Analysis [19.73325740171627]
We introduce TokenProber, a method designed for sensitivity-aware differential testing.<n>Our approach is based on the key observation that adversarial prompts often succeed by exploiting discrepancies in how T2I models and safety checkers interpret sensitive content.<n>Our evaluation of TokenProber against 5 safety checkers on 3 popular T2I models, using 324 NSFW prompts, demonstrates its superior effectiveness.
arXiv Detail & Related papers (2025-05-11T06:32:33Z)
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation [39.45602029655288]
T2ISafety is a safety benchmark that evaluates T2I models across three key domains: toxicity, fairness, and bias. We build a large-scale T2I dataset with 68K manually annotated images and train an evaluator capable of detecting critical risks. We evaluate 12 prominent diffusion models on T2ISafety and reveal several concerns including persistent issues with racial fairness, a tendency to generate toxic content, and significant variation in privacy protection across the models.
arXiv Detail & Related papers (2025-01-22T03:29:43Z)
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation [68.07258248467309]
Text-to-image (T2I) models have become widespread, but their limited safety guardrails expose end users to harmful content and potentially allow for model misuse. Current safety measures are typically limited to text-based filtering or concept removal strategies, able to remove just a few concepts from the model's generative capabilities. We introduce SafetyDPO, a method for safety alignment of T2I models through Direct Preference Optimization (DPO) We train safety experts, in the form of low-rank adaptation (LoRA) matrices, able to guide the generation process away from specific safety-related
arXiv Detail & Related papers (2024-12-13T18:59:52Z)
Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding [16.188657772178747]
We propose Embedding Sanitizer (ES), which enhances the safety of text-to-image models by sanitizing inappropriate concepts in prompt embeddings. ES is the first interpretable safe generation framework that assigns a score to each token in the prompt to indicate its potential harmfulness.
arXiv Detail & Related papers (2024-11-15T16:29:02Z)
RT-Attack: Jailbreaking Text-to-Image Models via Random Token [24.61198605177661]
We introduce a two-stage query-based black-box attack method utilizing random search. In the first stage, we establish a preliminary prompt by maximizing the semantic similarity between the adversarial and target harmful prompts. In the second stage, we use this initial prompt to refine our approach, creating a detailed adversarial prompt aimed at jailbreaking.
arXiv Detail & Related papers (2024-08-25T17:33:40Z)
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [97.0899853256201]
We present a novel task and benchmark for evaluating the ability of text-to-image generation models to produce images that align with commonsense in real life. We evaluate whether T2I models can conduct visual-commonsense reasoning, e.g. produce images that fit "the lightbulb is unlit" vs. "the lightbulb is lit" We benchmark a variety of state-of-the-art (sota) T2I models and surprisingly find that, there is still a large gap between image synthesis and real life photos.
arXiv Detail & Related papers (2024-06-11T17:59:48Z)
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) [62.44395685571094]
We introduce T2IScoreScore, a curated set of semantic error graphs containing a prompt and a set of increasingly erroneous images. These allow us to rigorously judge whether a given prompt faithfulness metric can correctly order images with respect to their objective error count. We find that the state-of-the-art VLM-based metrics fail to significantly outperform simple (and supposedly worse) feature-based metrics like CLIPScore.
arXiv Detail & Related papers (2024-04-05T17:57:16Z)
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation [150.57983348059528]
PRISM is an algorithm that automatically identifies human-interpretable and transferable prompts. It can effectively generate desired concepts given only black-box access to T2I models. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, styles and images.
arXiv Detail & Related papers (2024-03-28T02:35:53Z)
GuardT2I: Defending Text-to-Image Models from Adversarial Prompts [16.317849859000074]
GuardT2I is a novel moderation framework that adopts a generative approach to enhance T2I models' robustness against adversarial prompts. Our experiments reveal that GuardT2I outperforms leading commercial solutions like OpenAI-Moderation and Microsoft Azure Moderator.
arXiv Detail & Related papers (2024-03-03T09:04:34Z)
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation [19.06501699814924]
We build the Adversarial Nibbler Challenge, a red-teaming methodology for crowdsourcing implicitly adversarial prompts. The challenge is run in consecutive rounds to enable a sustained discovery and analysis of safety pitfalls in T2I models. We find that 14% of images that humans consider harmful are mislabeled as safe'' by machines.
arXiv Detail & Related papers (2024-02-14T22:21:12Z)
Harm Amplification in Text-to-Image Models [5.397559484007124]
Text-to-image (T2I) models have emerged as a significant advancement in generative AI. There exist safety concerns regarding their potential to produce harmful image outputs even when users input seemingly safe prompts. This phenomenon, where T2I models generate harmful representations that were not explicit in the input prompt, poses a potentially greater risk than adversarial prompts.
arXiv Detail & Related papers (2024-02-01T23:12:57Z)
Navigating the OverKill in Large Language Models [84.62340510027042]
We investigate the factors for overkill by exploring how models handle and determine the safety of queries. Our findings reveal the presence of shortcuts within models, leading to an over-attention of harmful words like 'kill' and prompts emphasizing safety will exacerbate overkill. We introduce Self-Contrastive Decoding (Self-CD), a training-free and model-agnostic strategy, to alleviate this phenomenon.
arXiv Detail & Related papers (2024-01-31T07:26:47Z)
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models [34.75181539924584]
We introduce a new test suite called XSTest to identify such eXaggerated Safety behaviours. We describe XSTest's creation and composition, and then use the test suite to highlight systematic failure modes in state-of-the-art language models.
arXiv Detail & Related papers (2023-08-02T16:30:40Z)
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection [53.320946030761796]
diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt. We show that large T2I diffusion models are more faithful than usually assumed, and can generate images faithful to even complex prompts. We introduce a pipeline that generates candidate images for a text prompt and picks the best one according to an automatic scoring system.
arXiv Detail & Related papers (2023-05-22T17:59:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.