Related papers: Towards SFW sampling for diffusion models via external conditioning

Towards SFW sampling for diffusion models via external conditioning

URL: http://arxiv.org/abs/2505.08817v1
Date: Mon, 12 May 2025 17:27:40 GMT
Title: Towards SFW sampling for diffusion models via external conditioning
Authors: Camilo Carvajal Reyes, Joaquín Fontbona, Felipe Tobar,
Abstract summary: This article explores the use of external sources for ensuring safe outputs in Score-based generative models (SBMs)<n>Our safe-for-work (SFW) sampler implements a Conditional Trajectory Correction step that guides the samples away from undesired regions in the ambient space.<n>Our experiments on the text-to-image SBM Stable Diffusion validate that the proposed SFW sampler effectively reduces the generation of explicit content.
Score: 1.0923877073891446
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Score-based generative models (SBM), also known as diffusion models, are the de facto state of the art for image synthesis. Despite their unparalleled performance, SBMs have recently been in the spotlight for being tricked into creating not-safe-for-work (NSFW) content, such as violent images and non-consensual nudity. Current approaches that prevent unsafe generation are based on the models' own knowledge, and the majority of them require fine-tuning. This article explores the use of external sources for ensuring safe outputs in SBMs. Our safe-for-work (SFW) sampler implements a Conditional Trajectory Correction step that guides the samples away from undesired regions in the ambient space using multimodal models as the source of conditioning. Furthermore, using Contrastive Language Image Pre-training (CLIP), our method admits user-defined NSFW classes, which can vary in different settings. Our experiments on the text-to-image SBM Stable Diffusion validate that the proposed SFW sampler effectively reduces the generation of explicit content while being competitive with other fine-tuning-based approaches, as assessed via independent NSFW detectors. Moreover, we evaluate the impact of the SFW sampler on image quality and show that the proposed correction scheme comes at a minor cost with negligible effect on samples not needing correction. Our study confirms the suitability of the SFW sampler towards aligned SBM models and the potential of using model-agnostic conditioning for the prevention of unwanted images.

Related papers

Training-Free Safe Denoisers for Safe Use of Diffusion Models [49.045799120267915]
Powerful diffusion models (DMs) are often misused to generate not-safe-for-work (NSFW) content or generate copyrighted material or data of individuals who wish to be forgotten.<n>We develop a practical algorithm that produces high-quality samples while avoiding negation areas of the data distribution.<n>These results hint at the great potential of our training-free safe denoiser for using DMs more safely.
arXiv Detail & Related papers (2025-02-11T23:14:39Z)
CROPS: Model-Agnostic Training-Free Framework for Safe Image Synthesis with Latent Diffusion Models [13.799517170191919]
Recent research has shown that safety checkers have vulnerabilities against adversarial attacks, allowing them to generate Not Safe For Work (NSFW) images.<n>We propose CROPS, a model-agnostic framework that easily defends against adversarial attacks generating NSFW images without requiring additional training.
arXiv Detail & Related papers (2025-01-09T16:43:21Z)
Diffusion Model Guided Sampling with Pixel-Wise Aleatoric Uncertainty Estimation [10.269485943949332]
We propose to estimate the pixel-wise aleatoric uncertainty during the sampling phase of diffusion models.<n>The uncertainty is computed as the variance of the denoising scores with a perturbation scheme specifically designed for diffusion models.<n>We show that our guided approach leads to better sample generation in terms of FID scores.
arXiv Detail & Related papers (2024-11-29T19:02:08Z)
Confidence Aware Learning for Reliable Face Anti-spoofing [52.23271636362843]
We propose a Confidence Aware Face Anti-spoofing model, which is aware of its capability boundary.<n>We estimate its confidence during the prediction of each sample.<n>Experiments show that the proposed CA-FAS can effectively recognize samples with low prediction confidence.
arXiv Detail & Related papers (2024-11-02T14:29:02Z)
Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models! [52.0855711767075]
EvoSeed is an evolutionary strategy-based algorithmic framework for generating photo-realistic natural adversarial samples. We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional Diffusion Model, results in the natural adversarial sample misclassified by the Model. Experiments show that generated adversarial images are of high image quality, raising concerns about generating harmful content bypassing safety classifiers.
arXiv Detail & Related papers (2024-02-07T09:39:29Z)
Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models? [52.238883592674696]
Ring-A-Bell is a model-agnostic red-teaming tool for T2I diffusion models. It identifies problematic prompts for diffusion models with the corresponding generation of inappropriate content. Our results show that Ring-A-Bell, by manipulating safe prompting benchmarks, can transform prompts that were originally regarded as safe to evade existing safety mechanisms.
arXiv Detail & Related papers (2023-10-16T02:11:20Z)
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts [63.61248884015162]
Text-to-image diffusion models have shown remarkable ability in high-quality content generation. This work proposes Prompting4 Debugging (P4D) as a tool that automatically finds problematic prompts for diffusion models. Our result shows that around half of prompts in existing safe prompting benchmarks which were originally considered "safe" can actually be manipulated to bypass many deployed safety mechanisms.
arXiv Detail & Related papers (2023-09-12T11:19:36Z)
Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation [25.55296442023984]
We propose a method, Unlearnable Diffusion Perturbation, to safeguard images from unauthorized exploitation. This achievement holds significant importance in real-world scenarios, as it contributes to the protection of privacy and copyright against AI-generated content.
arXiv Detail & Related papers (2023-06-02T20:19:19Z)
Masked Images Are Counterfactual Samples for Robust Fine-tuning [77.82348472169335]
Fine-tuning deep learning models can lead to a trade-off between in-distribution (ID) performance and out-of-distribution (OOD) robustness. We propose a novel fine-tuning method, which uses masked images as counterfactual samples that help improve the robustness of the fine-tuning model.
arXiv Detail & Related papers (2023-03-06T11:51:28Z)
How to Trust Your Diffusion Model: A Convex Optimization Approach to Conformal Risk Control [9.811982443156063]
We focus on image-to-image regression tasks and we present a generalization of the Risk-Controlling Prediction Sets (RCPS) procedure. Ours relies on a novel convex optimization approach that allows for multidimensional risk control while provably minimizing the mean interval length. We illustrate our approach on two real-world image denoising problems: on natural images of faces as well as on computed tomography (CT) scans of the abdomen.
arXiv Detail & Related papers (2023-02-07T23:01:16Z)
Are Diffusion Models Vulnerable to Membership Inference Attacks? [26.35177414594631]
Diffusion-based generative models have shown great potential for image synthesis, but there is a lack of research on the security and privacy risks they may pose. We investigate the vulnerability of diffusion models to Membership Inference Attacks (MIAs), a common privacy concern. We propose Step-wise Error Comparing Membership Inference (SecMI), a query-based MIA that infers memberships by assessing the matching of forward process posterior estimation at each timestep.
arXiv Detail & Related papers (2023-02-02T18:43:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.