Groot: Adversarial Testing for Generative Text-to-Image Models with
Tree-based Semantic Transformation
- URL: http://arxiv.org/abs/2402.12100v1
- Date: Mon, 19 Feb 2024 12:31:56 GMT
- Title: Groot: Adversarial Testing for Generative Text-to-Image Models with
Tree-based Semantic Transformation
- Authors: Yi Liu, Guowei Yang, Gelei Deng, Feiyue Chen, Yuqi Chen, Ling Shi,
Tianwei Zhang, and Yang Liu
- Abstract summary: adversarial testing techniques have been developed to probe whether such models can be prompted to produce Not-Safe-For-Work (NSFW) content.
We introduce Groot, the first automated framework leveraging tree-based semantic transformation for adversarial testing of text-to-image models.
Our comprehensive evaluation confirms the efficacy of Groot, which not only exceeds the performance of current state-of-the-art approaches but also achieves a remarkable success rate (93.66%) on leading text-to-image models.
- Score: 16.79414725225863
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the prevalence of text-to-image generative models, their safety becomes
a critical concern. adversarial testing techniques have been developed to probe
whether such models can be prompted to produce Not-Safe-For-Work (NSFW)
content. However, existing solutions face several challenges, including low
success rate and inefficiency. We introduce Groot, the first automated
framework leveraging tree-based semantic transformation for adversarial testing
of text-to-image models. Groot employs semantic decomposition and sensitive
element drowning strategies in conjunction with LLMs to systematically refine
adversarial prompts. Our comprehensive evaluation confirms the efficacy of
Groot, which not only exceeds the performance of current state-of-the-art
approaches but also achieves a remarkable success rate (93.66%) on leading
text-to-image models such as DALL-E 3 and Midjourney.
Related papers
- DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation [0.13124513975412253]
We present a novel framework for testing vision neural networks that leverages Large Language Models and control-conditioned Diffusion Models.
Our approach begins by translating images into detailed textual descriptions using a captioning model.
These descriptions are then used to produce new test images through a text-to-image diffusion process.
arXiv Detail & Related papers (2025-02-05T16:35:42Z) - Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [77.86514804787622]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.
We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.
We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z) - An indicator for effectiveness of text-to-image guardrails utilizing the Single-Turn Crescendo Attack (STCA) [0.0]
Single-Turn Crescendo Attack (STCA) is an innovative method designed to bypass the ethical safeguards of text-to-text AI models.
This study provides a framework for researchers to rigorously evaluate the robustness of guardrails in text-to-image models.
arXiv Detail & Related papers (2024-11-27T19:09:16Z) - SteerDiff: Steering towards Safe Text-to-Image Diffusion Models [5.781285400461636]
Text-to-image (T2I) diffusion models can be misused to produce inappropriate content.
We introduce SteerDiff, a lightweight adaptor module designed to act as an intermediary between user input and the diffusion model.
We conduct extensive experiments across various concept unlearning tasks to evaluate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-03T17:34:55Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users [18.3621509910395]
We propose a novel Automatic Red-Teaming framework, ART, to evaluate the safety risks of text-to-image models.
With our comprehensive experiments, we reveal the toxicity of the popular open-source text-to-image models.
We also introduce three large-scale red-teaming datasets for studying the safety risks associated with text-to-image models.
arXiv Detail & Related papers (2024-05-24T07:44:27Z) - SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution [21.93748586123046]
We develop and exhibit the first prompt attacks on Midjourney, resulting in the production of abundant NSFW images.
Our framework, SurrogatePrompt, systematically generates attack prompts, utilizing large language models, image-to-text, and image-to-image modules.
Results disclose an 88% success rate in bypassing Midjourney's proprietary safety filter with our attack prompts.
arXiv Detail & Related papers (2023-09-25T13:20:15Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest.
We demonstrate that this technology can be attacked to generate content that subtly manipulates its users.
We propose a Backdoor Attack on text-to-image Generative Models (BAGM)
Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z) - DreamArtist++: Controllable One-Shot Text-to-Image Generation via Positive-Negative Adapter [63.622879199281705]
Some example-based image generation approaches have been proposed, emphi.e. generating new concepts based on absorbing the salient features of a few input references.
We propose a simple yet effective framework, namely DreamArtist, which adopts a novel positive-negative prompt-tuning learning strategy on the pre-trained diffusion model.
We have conducted extensive experiments and evaluated the proposed method from image similarity (fidelity) and diversity, generation controllability, and style cloning.
arXiv Detail & Related papers (2022-11-21T10:37:56Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.