SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation
- URL: http://arxiv.org/abs/2511.11014v1
- Date: Fri, 14 Nov 2025 07:04:06 GMT
- Title: SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation
- Authors: Sumin Yu, Taesup Moon,
- Abstract summary: diffusion-based T2I models have achieved remarkable image generation quality.<n>They also enable easy creation of harmful content.<n>Our method, SP-Guard, addresses these limitations by estimating prompt harmfulness and applying a selective guidance mask.
- Score: 21.845417608250035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While diffusion-based T2I models have achieved remarkable image generation quality, they also enable easy creation of harmful content, raising social concerns and highlighting the need for safer generation. Existing inference-time guiding methods lack both adaptivity--adjusting guidance strength based on the prompt--and selectivity--targeting only unsafe regions of the image. Our method, SP-Guard, addresses these limitations by estimating prompt harmfulness and applying a selective guidance mask to guide only unsafe areas. Experiments show that SP-Guard generates safer images than existing methods while minimizing unintended content alteration. Beyond improving safety, our findings highlight the importance of transparency and controllability in image generation.
Related papers
- Self-Guard: Defending Large Reasoning Models via enhanced self-reflection [54.775612141528164]
Self-Guard is a lightweight safety defense framework for Large Reasoning Models.<n>It bridges the awareness-compliance gap, achieving robust safety performance without compromising model utility.<n>Self-Guard exhibits strong generalization across diverse unseen risks and varying model scales.
arXiv Detail & Related papers (2026-01-31T13:06:11Z) - SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models [67.84174763413178]
We introduce SafeRedir, a lightweight inference-time framework for robust unlearning via prompt embedding redirection.<n>We show that SafeRedir achieves effective unlearning capability, high semantic and perceptual preservation, robust image quality, and enhanced resistance to adversarial attacks.
arXiv Detail & Related papers (2026-01-13T15:01:38Z) - SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability [49.074914896839466]
We introduce SafeVision, a novel image guardrail that integrates human-like reasoning to enhance adaptability and transparency.<n>Our approach incorporates an effective data collection and generation framework, a policy-following training pipeline, and a customized loss function.<n>We show that SafeVision achieves state-of-the-art performance on different benchmarks.
arXiv Detail & Related papers (2025-10-28T00:35:59Z) - SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation [5.313750874857107]
We introduce SafetyPairs, a framework for generating counterfactual pairs of images that differ only in the features relevant to the given safety policy.<n>Using SafetyPairs, we construct a new safety benchmark, which serves as a powerful source of evaluation data.<n>We release a benchmark containing over 3,020 SafetyPair images spanning a diverse taxonomy of 9 safety categories.
arXiv Detail & Related papers (2025-10-24T03:19:48Z) - SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models [74.11062256255387]
Text-to-image models are highly vulnerable to adversarial prompts, which can bypass safety measures and produce harmful content.<n>We introduce SafeGuider, a two-step framework designed for robust safety control without compromising generation quality.<n>SafeGuider demonstrates exceptional effectiveness in minimizing attack success rates, achieving a maximum rate of only 5.48% across various attack scenarios.
arXiv Detail & Related papers (2025-10-05T10:24:48Z) - SafeCtrl: Region-Based Safety Control for Text-to-Image Diffusion via Detect-Then-Suppress [48.20360860166279]
We introduce SafeCtrl, a lightweight, non-intrusive plugin that first precisely localizes unsafe content.<n>Instead of performing a hard A-to-B substitution, SafeCtrl then suppresses the harmful semantics, allowing the generative process to naturally and coherently resolve into a safe, context-aware alternative.
arXiv Detail & Related papers (2025-08-16T04:28:52Z) - PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation [30.2092299298228]
Text-to-image (T2I) models are vulnerable to producing not-safe-for-work (NSFW) content, such as violent or explicit imagery.<n>We propose PromptSafe, a gated prompt tuning framework that combines a lightweight, text-only supervised soft embedding with an inference-time gated control network.<n>We show that PromptSafe achieves a SOTA unsafe generation rate (2.36%) while preserving high benign fidelity.
arXiv Detail & Related papers (2025-08-02T09:09:40Z) - PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models [38.45239843869313]
Text-to-image (T2I) models have exhibited remarkable performance in generating high-quality images from text descriptions.<n>T2I models are vulnerable to misuse, particularly generating not-safe-for-work (NSFW) content.<n>We present PromptGuard, a novel content moderation technique that draws inspiration from the system prompt mechanism in large language models.
arXiv Detail & Related papers (2025-01-07T05:39:21Z) - SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation [65.30207993362595]
Unlearning/editing-based methods for safe generation remove harmful concepts from models but face several challenges.<n>We propose SAFREE, a training-free approach for safe T2I and T2V.<n>We detect a subspace corresponding to a set of toxic concepts in the text embedding space and steer prompt embeddings away from this subspace.
arXiv Detail & Related papers (2024-10-16T17:32:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.