Related papers: SafeGen: Embedding Ethical Safeguards in Text-to-Image Generation

SafeGen: Embedding Ethical Safeguards in Text-to-Image Generation

URL: http://arxiv.org/abs/2512.12501v1
Date: Sun, 14 Dec 2025 00:18:10 GMT
Title: SafeGen: Embedding Ethical Safeguards in Text-to-Image Generation
Authors: Dang Phuong Nam, Nguyen Kieu, Pham Thanh Hieu,
Abstract summary: This paper introduces SafeGen, a framework that embeds ethical safeguards directly into the text-to-image generation pipeline.<n>SafeGen integrates two complementary components: BGE-M3, a fine-tuned text classifier that filters harmful or misleading prompts, and Hyper-SD, an optimized diffusion model that produces high fidelity, semantically aligned images.<n>Case studies illustrate SafeGen's practical impact in blocking unsafe prompts, generating inclusive teaching materials, and reinforcing academic integrity.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative Artificial Intelligence (AI) has created unprecedented opportunities for creative expression, education, and research. Text-to-image systems such as DALL.E, Stable Diffusion, and Midjourney can now convert ideas into visuals within seconds, but they also present a dual-use dilemma, raising critical ethical concerns: amplifying societal biases, producing high-fidelity disinformation, and violating intellectual property. This paper introduces SafeGen, a framework that embeds ethical safeguards directly into the text-to-image generation pipeline, grounding its design in established principles for Trustworthy AI. SafeGen integrates two complementary components: BGE-M3, a fine-tuned text classifier that filters harmful or misleading prompts, and Hyper-SD, an optimized diffusion model that produces high fidelity, semantically aligned images. Built on a curated multilingual (English- Vietnamese) dataset and a fairness-aware training process, SafeGen demonstrates that creative freedom and ethical responsibility can be reconciled within a single workflow. Quantitative evaluations confirm its effectiveness, with Hyper-SD achieving IS = 3.52, FID = 22.08, and SSIM = 0.79, while BGE-M3 reaches an F1-Score of 0.81. An ablation study further validates the importance of domain-specific fine-tuning for both modules. Case studies illustrate SafeGen's practical impact in blocking unsafe prompts, generating inclusive teaching materials, and reinforcing academic integrity.

Related papers

Human-AI Collaboration Mechanism Study on AIGC Assisted Image Production for Special Coverage [3.3862265832319287]
Key issues involve misinformation, authenticity, semantic fidelity, and interpretability.<n>Most AIGC tools are opaque "black boxes," hindering the dual demands of content accuracy and semantic alignment.<n>This paper explores pathways for controllable image production in journalism's special coverage.
arXiv Detail & Related papers (2025-12-14T16:05:14Z)
Value-Aligned Prompt Moderation via Zero-Shot Agentic Rewriting for Safe Image Generation [11.663809872664103]
Current defenses struggle to align outputs with human values without sacrificing generation quality or incurring high costs.<n>We introduce VALOR, a zero-shot agentic framework for safer and more helpful text-to-image generation.<n>VALOR integrates layered prompt analysis with human-aligned value reasoning.
arXiv Detail & Related papers (2025-11-12T09:52:47Z)
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models [74.11062256255387]
Text-to-image models are highly vulnerable to adversarial prompts, which can bypass safety measures and produce harmful content.<n>We introduce SafeGuider, a two-step framework designed for robust safety control without compromising generation quality.<n>SafeGuider demonstrates exceptional effectiveness in minimizing attack success rates, achieving a maximum rate of only 5.48% across various attack scenarios.
arXiv Detail & Related papers (2025-10-05T10:24:48Z)
RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation [14.603824133970798]
RespoDiff is a novel framework for responsible text-to-image generation.<n>Our approach improves responsible and semantically coherent generation by 20% across diverse prompts.<n>It integrates seamlessly into large-scale models like SDXL, enhancing fairness and safety.
arXiv Detail & Related papers (2025-09-18T07:48:46Z)
Interleaving Reasoning for Better Text-to-Image Generation [83.69082794730664]
We introduce Interleaving Reasoning Generation (IRG), a framework that alternates between text-based thinking and image synthesis.<n>To train IRG effectively, we propose Interleaving Reasoning Generation Learning (IRGL), which targets two sub-goals.<n>Experiments show SoTA performance, yielding absolute gains of 5-10 points on GenEval, WISE, TIIF, GenAI-Bench, and OneIG-EN.
arXiv Detail & Related papers (2025-09-08T17:56:23Z)
Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction [88.18235230849554]
Training multimodal generative models on large, uncurated datasets can result in users being exposed to harmful, unsafe and controversial or culturally-inappropriate outputs.<n>We leverage safe embeddings and a modified diffusion process with weighted tunable summation in the latent space to generate safer images.<n>We identify trade-offs between safety and censorship, which presents a necessary perspective in the development of ethical AI models.
arXiv Detail & Related papers (2024-11-21T09:47:13Z)
Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding [16.188657772178747]
We propose Embedding Sanitizer (ES), which enhances the safety of text-to-image models by sanitizing inappropriate concepts in prompt embeddings.<n>ES is the first interpretable safe generation framework that assigns a score to each token in the prompt to indicate its potential harmfulness.
arXiv Detail & Related papers (2024-11-15T16:29:02Z)
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models [29.866192834825572]
Unlearning techniques have been developed to remove the model's ability to generate potentially harmful content.<n>These methods are easily bypassed by adversarial attacks, making them unreliable for ensuring the safety of generated images.<n>We propose Direct Unlearning Optimization (DUO), a novel framework for removing Not Safe For Work (NSFW) content from T2I models.
arXiv Detail & Related papers (2024-07-17T08:19:11Z)
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models [51.69735366140249]
We introduce Ethical-Lens, a framework designed to facilitate the value-aligned usage of text-to-image tools.<n>Ethical-Lens ensures value alignment in text-to-image models across toxicity and bias dimensions.<n>Our experiments reveal that Ethical-Lens enhances alignment capabilities to levels comparable with or superior to commercial models.
arXiv Detail & Related papers (2024-04-18T11:38:25Z)
Latent Guard: a Safety Framework for Text-to-image Generation [64.49596711025993]
Existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification. We propose Latent Guard, a framework designed to improve safety measures in text-to-image generation. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts.
arXiv Detail & Related papers (2024-04-11T17:59:52Z)
Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models [79.50701155336198]
textbfForget-Me-Not is designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds. We demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution.
arXiv Detail & Related papers (2023-03-30T17:58:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.