SafeGen: Embedding Ethical Safeguards in Text-to-Image Generation
- URL: http://arxiv.org/abs/2512.12501v1
- Date: Sun, 14 Dec 2025 00:18:10 GMT
- Title: SafeGen: Embedding Ethical Safeguards in Text-to-Image Generation
- Authors: Dang Phuong Nam, Nguyen Kieu, Pham Thanh Hieu,
- Abstract summary: This paper introduces SafeGen, a framework that embeds ethical safeguards directly into the text-to-image generation pipeline.<n>SafeGen integrates two complementary components: BGE-M3, a fine-tuned text classifier that filters harmful or misleading prompts, and Hyper-SD, an optimized diffusion model that produces high fidelity, semantically aligned images.<n>Case studies illustrate SafeGen's practical impact in blocking unsafe prompts, generating inclusive teaching materials, and reinforcing academic integrity.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Artificial Intelligence (AI) has created unprecedented opportunities for creative expression, education, and research. Text-to-image systems such as DALL.E, Stable Diffusion, and Midjourney can now convert ideas into visuals within seconds, but they also present a dual-use dilemma, raising critical ethical concerns: amplifying societal biases, producing high-fidelity disinformation, and violating intellectual property. This paper introduces SafeGen, a framework that embeds ethical safeguards directly into the text-to-image generation pipeline, grounding its design in established principles for Trustworthy AI. SafeGen integrates two complementary components: BGE-M3, a fine-tuned text classifier that filters harmful or misleading prompts, and Hyper-SD, an optimized diffusion model that produces high fidelity, semantically aligned images. Built on a curated multilingual (English- Vietnamese) dataset and a fairness-aware training process, SafeGen demonstrates that creative freedom and ethical responsibility can be reconciled within a single workflow. Quantitative evaluations confirm its effectiveness, with Hyper-SD achieving IS = 3.52, FID = 22.08, and SSIM = 0.79, while BGE-M3 reaches an F1-Score of 0.81. An ablation study further validates the importance of domain-specific fine-tuning for both modules. Case studies illustrate SafeGen's practical impact in blocking unsafe prompts, generating inclusive teaching materials, and reinforcing academic integrity.
Related papers
- Human-AI Collaboration Mechanism Study on AIGC Assisted Image Production for Special Coverage [3.3862265832319287]
Key issues involve misinformation, authenticity, semantic fidelity, and interpretability.<n>Most AIGC tools are opaque "black boxes," hindering the dual demands of content accuracy and semantic alignment.<n>This paper explores pathways for controllable image production in journalism's special coverage.
arXiv Detail & Related papers (2025-12-14T16:05:14Z) - Value-Aligned Prompt Moderation via Zero-Shot Agentic Rewriting for Safe Image Generation [11.663809872664103]
Current defenses struggle to align outputs with human values without sacrificing generation quality or incurring high costs.<n>We introduce VALOR, a zero-shot agentic framework for safer and more helpful text-to-image generation.<n>VALOR integrates layered prompt analysis with human-aligned value reasoning.
arXiv Detail & Related papers (2025-11-12T09:52:47Z) - SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models [74.11062256255387]
Text-to-image models are highly vulnerable to adversarial prompts, which can bypass safety measures and produce harmful content.<n>We introduce SafeGuider, a two-step framework designed for robust safety control without compromising generation quality.<n>SafeGuider demonstrates exceptional effectiveness in minimizing attack success rates, achieving a maximum rate of only 5.48% across various attack scenarios.
arXiv Detail & Related papers (2025-10-05T10:24:48Z) - RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation [14.603824133970798]
RespoDiff is a novel framework for responsible text-to-image generation.<n>Our approach improves responsible and semantically coherent generation by 20% across diverse prompts.<n>It integrates seamlessly into large-scale models like SDXL, enhancing fairness and safety.
arXiv Detail & Related papers (2025-09-18T07:48:46Z) - Interleaving Reasoning for Better Text-to-Image Generation [83.69082794730664]
We introduce Interleaving Reasoning Generation (IRG), a framework that alternates between text-based thinking and image synthesis.<n>To train IRG effectively, we propose Interleaving Reasoning Generation Learning (IRGL), which targets two sub-goals.<n>Experiments show SoTA performance, yielding absolute gains of 5-10 points on GenEval, WISE, TIIF, GenAI-Bench, and OneIG-EN.
arXiv Detail & Related papers (2025-09-08T17:56:23Z) - Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction [88.18235230849554]
Training multimodal generative models on large, uncurated datasets can result in users being exposed to harmful, unsafe and controversial or culturally-inappropriate outputs.<n>We leverage safe embeddings and a modified diffusion process with weighted tunable summation in the latent space to generate safer images.<n>We identify trade-offs between safety and censorship, which presents a necessary perspective in the development of ethical AI models.
arXiv Detail & Related papers (2024-11-21T09:47:13Z) - Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding [16.188657772178747]
We propose Embedding Sanitizer (ES), which enhances the safety of text-to-image models by sanitizing inappropriate concepts in prompt embeddings.<n>ES is the first interpretable safe generation framework that assigns a score to each token in the prompt to indicate its potential harmfulness.
arXiv Detail & Related papers (2024-11-15T16:29:02Z) - Direct Unlearning Optimization for Robust and Safe Text-to-Image Models [29.866192834825572]
Unlearning techniques have been developed to remove the model's ability to generate potentially harmful content.<n>These methods are easily bypassed by adversarial attacks, making them unreliable for ensuring the safety of generated images.<n>We propose Direct Unlearning Optimization (DUO), a novel framework for removing Not Safe For Work (NSFW) content from T2I models.
arXiv Detail & Related papers (2024-07-17T08:19:11Z) - Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models [51.69735366140249]
We introduce Ethical-Lens, a framework designed to facilitate the value-aligned usage of text-to-image tools.<n>Ethical-Lens ensures value alignment in text-to-image models across toxicity and bias dimensions.<n>Our experiments reveal that Ethical-Lens enhances alignment capabilities to levels comparable with or superior to commercial models.
arXiv Detail & Related papers (2024-04-18T11:38:25Z) - Latent Guard: a Safety Framework for Text-to-image Generation [64.49596711025993]
Existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification.
We propose Latent Guard, a framework designed to improve safety measures in text-to-image generation.
Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts.
arXiv Detail & Related papers (2024-04-11T17:59:52Z) - Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models [79.50701155336198]
textbfForget-Me-Not is designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds.
We demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts.
It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution.
arXiv Detail & Related papers (2023-03-30T17:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.