Related papers: SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation

SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation

URL: http://arxiv.org/abs/2511.11014v1
Date: Fri, 14 Nov 2025 07:04:06 GMT
Title: SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation
Authors: Sumin Yu, Taesup Moon,
Abstract summary: diffusion-based T2I models have achieved remarkable image generation quality.<n>They also enable easy creation of harmful content.<n>Our method, SP-Guard, addresses these limitations by estimating prompt harmfulness and applying a selective guidance mask.
Score: 21.845417608250035
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While diffusion-based T2I models have achieved remarkable image generation quality, they also enable easy creation of harmful content, raising social concerns and highlighting the need for safer generation. Existing inference-time guiding methods lack both adaptivity--adjusting guidance strength based on the prompt--and selectivity--targeting only unsafe regions of the image. Our method, SP-Guard, addresses these limitations by estimating prompt harmfulness and applying a selective guidance mask to guide only unsafe areas. Experiments show that SP-Guard generates safer images than existing methods while minimizing unintended content alteration. Beyond improving safety, our findings highlight the importance of transparency and controllability in image generation.

Related papers

Self-Guard: Defending Large Reasoning Models via enhanced self-reflection [54.775612141528164]
Self-Guard is a lightweight safety defense framework for Large Reasoning Models.<n>It bridges the awareness-compliance gap, achieving robust safety performance without compromising model utility.<n>Self-Guard exhibits strong generalization across diverse unseen risks and varying model scales.
arXiv Detail & Related papers (2026-01-31T13:06:11Z)
SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models [67.84174763413178]
We introduce SafeRedir, a lightweight inference-time framework for robust unlearning via prompt embedding redirection.<n>We show that SafeRedir achieves effective unlearning capability, high semantic and perceptual preservation, robust image quality, and enhanced resistance to adversarial attacks.
arXiv Detail & Related papers (2026-01-13T15:01:38Z)
SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability [49.074914896839466]
We introduce SafeVision, a novel image guardrail that integrates human-like reasoning to enhance adaptability and transparency.<n>Our approach incorporates an effective data collection and generation framework, a policy-following training pipeline, and a customized loss function.<n>We show that SafeVision achieves state-of-the-art performance on different benchmarks.
arXiv Detail & Related papers (2025-10-28T00:35:59Z)
SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation [5.313750874857107]
We introduce SafetyPairs, a framework for generating counterfactual pairs of images that differ only in the features relevant to the given safety policy.<n>Using SafetyPairs, we construct a new safety benchmark, which serves as a powerful source of evaluation data.<n>We release a benchmark containing over 3,020 SafetyPair images spanning a diverse taxonomy of 9 safety categories.
arXiv Detail & Related papers (2025-10-24T03:19:48Z)
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models [74.11062256255387]
Text-to-image models are highly vulnerable to adversarial prompts, which can bypass safety measures and produce harmful content.<n>We introduce SafeGuider, a two-step framework designed for robust safety control without compromising generation quality.<n>SafeGuider demonstrates exceptional effectiveness in minimizing attack success rates, achieving a maximum rate of only 5.48% across various attack scenarios.
arXiv Detail & Related papers (2025-10-05T10:24:48Z)
SafeCtrl: Region-Based Safety Control for Text-to-Image Diffusion via Detect-Then-Suppress [48.20360860166279]
We introduce SafeCtrl, a lightweight, non-intrusive plugin that first precisely localizes unsafe content.<n>Instead of performing a hard A-to-B substitution, SafeCtrl then suppresses the harmful semantics, allowing the generative process to naturally and coherently resolve into a safe, context-aware alternative.
arXiv Detail & Related papers (2025-08-16T04:28:52Z)
PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation [30.2092299298228]
Text-to-image (T2I) models are vulnerable to producing not-safe-for-work (NSFW) content, such as violent or explicit imagery.<n>We propose PromptSafe, a gated prompt tuning framework that combines a lightweight, text-only supervised soft embedding with an inference-time gated control network.<n>We show that PromptSafe achieves a SOTA unsafe generation rate (2.36%) while preserving high benign fidelity.
arXiv Detail & Related papers (2025-08-02T09:09:40Z)
PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models [38.45239843869313]
Text-to-image (T2I) models have exhibited remarkable performance in generating high-quality images from text descriptions.<n>T2I models are vulnerable to misuse, particularly generating not-safe-for-work (NSFW) content.<n>We present PromptGuard, a novel content moderation technique that draws inspiration from the system prompt mechanism in large language models.
arXiv Detail & Related papers (2025-01-07T05:39:21Z)
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation [65.30207993362595]
Unlearning/editing-based methods for safe generation remove harmful concepts from models but face several challenges.<n>We propose SAFREE, a training-free approach for safe T2I and T2V.<n>We detect a subspace corresponding to a set of toxic concepts in the text embedding space and steer prompt embeddings away from this subspace.
arXiv Detail & Related papers (2024-10-16T17:32:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.