Related papers: NDM: A Noise-driven Detection and Mitigation Framework against Implicit Sexual Intentions in Text-to-Image Generation

NDM: A Noise-driven Detection and Mitigation Framework against Implicit Sexual Intentions in Text-to-Image Generation

URL: http://arxiv.org/abs/2510.15752v1
Date: Fri, 17 Oct 2025 15:37:02 GMT
Title: NDM: A Noise-driven Detection and Mitigation Framework against Implicit Sexual Intentions in Text-to-Image Generation
Authors: Yitong Sun, Yao Huang, Ruochen Zhang, Huanran Chen, Shouwei Ruan, Ranjie Duan, Xingxing Wei,
Abstract summary: Text-to-image (T2I) models are vulnerable to generating inappropriate content.<n> implicit sexual prompts, often disguised as seemingly benign terms, can unexpectedly trigger sexual content.<n>We propose NDM, the first noise-driven detection and mitigation framework.
Score: 41.058425895887616
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite the impressive generative capabilities of text-to-image (T2I) diffusion models, they remain vulnerable to generating inappropriate content, especially when confronted with implicit sexual prompts. Unlike explicit harmful prompts, these subtle cues, often disguised as seemingly benign terms, can unexpectedly trigger sexual content due to underlying model biases, raising significant ethical concerns. However, existing detection methods are primarily designed to identify explicit sexual content and therefore struggle to detect these implicit cues. Fine-tuning approaches, while effective to some extent, risk degrading the model's generative quality, creating an undesirable trade-off. To address this, we propose NDM, the first noise-driven detection and mitigation framework, which could detect and mitigate implicit malicious intention in T2I generation while preserving the model's original generative capabilities. Specifically, we introduce two key innovations: first, we leverage the separability of early-stage predicted noise to develop a noise-based detection method that could identify malicious content with high accuracy and efficiency; second, we propose a noise-enhanced adaptive negative guidance mechanism that could optimize the initial noise by suppressing the prominent region's attention, thereby enhancing the effectiveness of adaptive negative guidance for sexual mitigation. Experimentally, we validate NDM on both natural and adversarial datasets, demonstrating its superior performance over existing SOTA methods, including SLD, UCE, and RECE, etc. Code and resources are available at https://github.com/lorraine021/NDM.

Related papers

Dual Attention Guided Defense Against Malicious Edits [70.17363183107604]
We propose a Dual Attention-Guided Noise Perturbation (DANP) immunization method that adds imperceptible perturbations to disrupt the model's semantic understanding and generation process.<n>Our method exhibits impressive immunity against malicious edits, and extensive experiments confirm that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-16T12:01:28Z)
Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance [54.88271057438763]
Noise Awareness Guidance (NAG) is a correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule.<n>NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.
arXiv Detail & Related papers (2025-10-14T13:31:34Z)
Unveiling Hidden Vulnerabilities in Digital Human Generation via Adversarial Attacks [14.356235723912564]
We propose a novel framework designed to generate adversarial examples capable of effectively compromising any digital human generation model.<n>Our approach introduces a textbf Dual Heterogeneous Noise Generator (DHNG), which leverages Variational Autoencoders (VAE) and ControlNet to produce diverse, targeted noise tailored to the original image features.<n>Extensive experiments demonstrate TBA's superiority, achieving a remarkable 41.0% increase in estimation error, with an average improvement of approximately 17.0%.
arXiv Detail & Related papers (2025-04-24T11:42:10Z)
A Knowledge-guided Adversarial Defense for Resisting Malicious Visual Manipulation [93.28532038721816]
Malicious applications of visual manipulation have raised serious threats to the security and reputation of users in many fields.<n>We propose a knowledge-guided adversarial defense (KGAD) to actively force malicious manipulation models to output semantically confusing samples.
arXiv Detail & Related papers (2025-04-11T10:18:13Z)
MIGA: Mutual Information-Guided Attack on Denoising Models for Semantic Manipulation [39.12448251986432]
We propose Mutual Information-Guided Attack (MIGA) to directly attack deep denoising models.<n>MIGA strategically disrupts denoising models' ability to preserve semantic content via adversarial perturbations.<n>Our findings suggest that denoising models are not always robust and can introduce security risks in real-world applications.
arXiv Detail & Related papers (2025-03-10T06:26:34Z)
Generative Edge Detection with Stable Diffusion [52.870631376660924]
Edge detection is typically viewed as a pixel-level classification problem mainly addressed by discriminative methods. We propose a novel approach, named Generative Edge Detector (GED), by fully utilizing the potential of the pre-trained stable diffusion model. We conduct extensive experiments on multiple datasets and achieve competitive performance.
arXiv Detail & Related papers (2024-10-04T01:52:23Z)
Training More Robust Classification Model via Discriminative Loss and Gaussian Noise Injection [7.535952418691443]
We introduce a loss function applied at the penultimate layer that explicitly enforces intra-class compactness.<n>We also propose a class-wise feature alignment mechanism that brings noisy data clusters closer to their clean counterparts.<n>Our approach significantly reinforces model robustness to various perturbations while maintaining high accuracy on clean data.
arXiv Detail & Related papers (2024-05-28T18:10:45Z)
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery [71.6345505427213]
DPMesh is an innovative framework for occluded human mesh recovery. It capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-04-01T18:59:13Z)
Stable Unlearnable Example: Enhancing the Robustness of Unlearnable Examples via Stable Error-Minimizing Noise [31.586389548657205]
Unlearnable example is proposed to significantly degrade the generalization performance of models by adding a kind of imperceptible noise to the data. We introduce stable error-minimizing noise (SEM), which trains the defensive noise against random perturbation instead of the time-consuming adversarial perturbation. SEM achieves a new state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet Subset.
arXiv Detail & Related papers (2023-11-22T01:43:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.