GuardT2I: Defending Text-to-Image Models from Adversarial Prompts
- URL: http://arxiv.org/abs/2403.01446v1
- Date: Sun, 3 Mar 2024 09:04:34 GMT
- Title: GuardT2I: Defending Text-to-Image Models from Adversarial Prompts
- Authors: Yijun Yang, Ruiyuan Gao, Xiao Yang, Jianyuan Zhong, Qiang Xu
- Abstract summary: GuardT2I is a generative approach to enhance T2I models' robustness against adversarial prompts.
Our experiments reveal that GuardT2I outperforms leading commercial solutions like OpenAI-Moderation and Microsoft Azure Moderator.
- Score: 17.50653920106002
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advancements in Text-to-Image (T2I) models have raised significant
safety concerns about their potential misuse for generating inappropriate or
Not-Safe-For-Work (NSFW) contents, despite existing countermeasures such as
NSFW classifiers or model fine-tuning for inappropriate concept removal.
Addressing this challenge, our study unveils GuardT2I, a novel moderation
framework that adopts a generative approach to enhance T2I models' robustness
against adversarial prompts. Instead of making a binary classification,
GuardT2I utilizes a Large Language Model (LLM) to conditionally transform text
guidance embeddings within the T2I models into natural language for effective
adversarial prompt detection, without compromising the models' inherent
performance. Our extensive experiments reveal that GuardT2I outperforms leading
commercial solutions like OpenAI-Moderation and Microsoft Azure Moderator by a
significant margin across diverse adversarial scenarios.
Related papers
- ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users [18.3621509910395]
We propose a novel Automatic Red-Teaming framework, ART, to evaluate the safety risks of text-to-image models.
With our comprehensive experiments, we reveal the toxicity of the popular open-source text-to-image models.
We also introduce three large-scale red-teaming datasets for studying the safety risks associated with text-to-image models.
arXiv Detail & Related papers (2024-05-24T07:44:27Z) - Latent Guard: a Safety Framework for Text-to-image Generation [64.49596711025993]
Latent Guard is a framework designed to improve safety measures in text-to-image generation.
Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder.
Our proposed framework is composed of a data generation pipeline specific to the task.
arXiv Detail & Related papers (2024-04-11T17:59:52Z) - Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation [150.57983348059528]
PRISM is an algorithm that automatically identifies human-interpretable and transferable prompts.
It can effectively generate desired concepts given only black-box access to T2I models.
Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, styles and images.
arXiv Detail & Related papers (2024-03-28T02:35:53Z) - Improving Text-to-Image Consistency via Automatic Prompt Optimization [26.2587505265501]
We introduce a T2I optimization-by-prompting framework, OPT2I, to improve prompt-image consistency in T2I models.
Our framework starts from a user prompt and iteratively generates revised prompts with the goal of maximizing a consistency score.
arXiv Detail & Related papers (2024-03-26T15:42:01Z) - Discriminative Probing and Tuning for Text-to-Image Generation [129.39674951747412]
Text-to-image generation (T2I) often faces text-image misalignment problems such as relation confusion in generated images.
We propose bolstering the discriminative abilities of T2I models to achieve more precise text-to-image alignment for generation.
We present a discriminative adapter built on T2I models to probe their discriminative abilities on two representative tasks and leverage discriminative fine-tuning to improve their text-image alignment.
arXiv Detail & Related papers (2024-03-07T08:37:33Z) - Position: Towards Implicit Prompt For Text-To-Image Models [57.00716011456852]
This paper highlights the current state of text-to-image (T2I) models toward implicit prompts.
We present a benchmark named ImplicitBench and conduct an investigation on the performance and impacts of implicit prompts.
Experiment results show that T2I models are able to accurately create various target symbols indicated by implicit prompts.
arXiv Detail & Related papers (2024-03-04T15:21:51Z) - Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation [19.06501699814924]
We build the Adversarial Nibbler Challenge, a red-teaming methodology for crowdsourcing implicitly adversarial prompts.
The challenge is run in consecutive rounds to enable a sustained discovery and analysis of safety pitfalls in T2I models.
We find that 14% of images that humans consider harmful are mislabeled as safe'' by machines.
arXiv Detail & Related papers (2024-02-14T22:21:12Z) - MMA-Diffusion: MultiModal Attack on Diffusion Models [32.67807098568781]
MMA-Diffusion presents a significant and realistic threat to the security of T2I models.
It circumvents current defensive measures in both open-source models and commercial online services.
arXiv Detail & Related papers (2023-11-29T10:39:53Z) - Adversarial Prompt Tuning for Vision-Language Models [90.89469048482249]
Adversarial Prompt Tuning (AdvPT) is a technique to enhance the adversarial robustness of image encoders in Vision-Language Models (VLMs)
We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques.
arXiv Detail & Related papers (2023-11-19T07:47:43Z) - Mini-DALLE3: Interactive Text to Image by Prompting Large Language
Models [71.49054220807983]
A prevalent limitation persists in the effective communication with T2I models, such as Stable Diffusion, using natural language descriptions.
Inspired by the recently released DALLE3, we revisit the existing T2I systems endeavoring to align human intent and introduce a new task - interactive text to image (iT2I)
We present a simple approach that augments LLMs for iT2I with prompting techniques and off-the-shelf T2I models.
arXiv Detail & Related papers (2023-10-11T16:53:40Z) - Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety
of Text-to-Image Models [6.475537049815622]
Adversarial Nibbler is a data-centric challenge, part of the DataPerf challenge suite, organized and supported by Kaggle and MLCommons.
arXiv Detail & Related papers (2023-05-22T15:02:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.