Related papers: Latent Guard: a Safety Framework for Text-to-image Generation

Latent Guard: a Safety Framework for Text-to-image Generation

URL: http://arxiv.org/abs/2404.08031v1
Date: Thu, 11 Apr 2024 17:59:52 GMT
Title: Latent Guard: a Safety Framework for Text-to-image Generation
Authors: Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip Torr, Fabio Pizzati,
Abstract summary: Latent Guard is a framework designed to improve safety measures in text-to-image generation. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder. Our proposed framework is composed of a data generation pipeline specific to the task.
Score: 64.49596711025993
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the ability to generate high-quality images, text-to-image (T2I) models can be exploited for creating inappropriate content. To prevent misuse, existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification, requiring large datasets for training and offering low flexibility. Hence, we propose Latent Guard, a framework designed to improve safety measures in text-to-image generation. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts in the input text embeddings. Our proposed framework is composed of a data generation pipeline specific to the task using large language models, ad-hoc architectural components, and a contrastive learning strategy to benefit from the generated data. The effectiveness of our method is verified on three datasets and against four baselines. Code and data will be shared at https://github.com/rt219/LatentGuard.

Related papers

T2UE: Generating Unlearnable Examples from Text Descriptions [60.111026156038264]
Unlearnable Examples (UEs) have emerged as a promising countermeasure against unauthorized model training.<n>We introduce textbfText-to-Unlearnable Example (T2UE), a novel framework that enables users to generate UEs using only text descriptions.
arXiv Detail & Related papers (2025-08-05T05:10:14Z)
GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models [65.91565607573786]
Text-to-image (T2I) models can be misused to generate harmful content, including nudity or violence.<n>Recent research on red-teaming and adversarial attacks against T2I models has notable limitations.<n>We propose GenBreak, a framework that fine-tunes a red-team large language model (LLM) to systematically explore underlying vulnerabilities.
arXiv Detail & Related papers (2025-06-11T09:09:12Z)
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation [68.07258248467309]
Text-to-image (T2I) models have become widespread, but their limited safety guardrails expose end users to harmful content and potentially allow for model misuse. Current safety measures are typically limited to text-based filtering or concept removal strategies, able to remove just a few concepts from the model's generative capabilities. We introduce SafetyDPO, a method for safety alignment of T2I models through Direct Preference Optimization (DPO) We train safety experts, in the form of low-rank adaptation (LoRA) matrices, able to guide the generation process away from specific safety-related
arXiv Detail & Related papers (2024-12-13T18:59:52Z)
Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction [49.60774626839712]
Training multimodal generative models can expose users to harmful, unsafe and controversial or culturally-inappropriate outputs. We propose a modular, dynamic solution that leverages safety-context embeddings and a dual reconstruction process to generate safer images. We achieve state-of-the-art results on safe image generation benchmarks, while offering controllable variation of model safety.
arXiv Detail & Related papers (2024-11-21T09:47:13Z)
Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding [13.481343482138888]
We propose a vision-agnostic safe generation framework, Embedding Sanitizer (ES) ES focuses on erasing inappropriate concepts from prompt embeddings and uses the sanitized embeddings to guide the model for safe generation. ES significantly outperforms existing safeguards in terms of interpretability and controllability while maintaining generation quality.
arXiv Detail & Related papers (2024-11-15T16:29:02Z)
TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images [84.08181780666698]
TextDestroyer is the first training- and annotation-free method for scene text destruction. Our method scrambles text areas in the latent start code using a Gaussian distribution before reconstruction. The advantages of TextDestroyer include: (1) it eliminates labor-intensive data annotation and resource-intensive training; (2) it achieves more thorough text destruction, preventing recognizable traces; and (3) it demonstrates better generalization capabilities, performing well on both real-world scenes and generated images.
arXiv Detail & Related papers (2024-11-01T04:41:00Z)
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models [29.866192834825572]
Unlearning techniques have been developed to remove the model's ability to generate potentially harmful content. These methods are easily bypassed by adversarial attacks, making them unreliable for ensuring the safety of generated images. We propose Direct Unlearning Optimization (DUO), a novel framework for removing Not Safe For Work (NSFW) content from T2I models.
arXiv Detail & Related papers (2024-07-17T08:19:11Z)
GuardT2I: Defending Text-to-Image Models from Adversarial Prompts [16.317849859000074]
GuardT2I is a novel moderation framework that adopts a generative approach to enhance T2I models' robustness against adversarial prompts. Our experiments reveal that GuardT2I outperforms leading commercial solutions like OpenAI-Moderation and Microsoft Azure Moderator.
arXiv Detail & Related papers (2024-03-03T09:04:34Z)
Universal Prompt Optimizer for Safe Text-to-Image Generation [27.32589928097192]
We propose the first universal prompt for safe T2I (POSI) generation in black-box scenario. Our approach can effectively reduce the likelihood of various T2I models in generating inappropriate images.
arXiv Detail & Related papers (2024-02-16T18:36:36Z)
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation [121.45667242282721]
We propose a coarse-to-fine paradigm to achieve layout planning and image generation. Our proposed method outperforms the state-of-the-art models in terms of photorealistic layout and image generation.
arXiv Detail & Related papers (2023-08-09T17:45:04Z)
Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models [79.50701155336198]
textbfForget-Me-Not is designed to safely remove specified IDs, objects, or styles from a well-configured text-to-image model in as little as 30 seconds. We demonstrate that Forget-Me-Not can effectively eliminate targeted concepts while maintaining the model's performance on other concepts. It can also be adapted as a lightweight model patch for Stable Diffusion, allowing for concept manipulation and convenient distribution.
arXiv Detail & Related papers (2023-03-30T17:58:11Z)
CoBIT: A Contrastive Bi-directional Image-Text Generation Model [72.1700346308106]
CoBIT employs a novel unicoder-decoder structure, which attempts to unify three pre-training objectives in one framework. CoBIT achieves superior performance in image understanding, image-text understanding (Retrieval, Captioning, VQA, SNLI-VE) and text-based content creation, particularly in zero-shot scenarios.
arXiv Detail & Related papers (2023-03-23T17:24:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.