Prompt Stealing Attacks Against Text-to-Image Generation Models
- URL: http://arxiv.org/abs/2302.09923v2
- Date: Mon, 15 Apr 2024 17:40:04 GMT
- Title: Prompt Stealing Attacks Against Text-to-Image Generation Models
- Authors: Xinyue Shen, Yiting Qu, Michael Backes, Yang Zhang,
- Abstract summary: A trend of trading high-quality prompts on specialized marketplaces has emerged.
Successful prompt stealing attacks directly violate the intellectual property of prompt engineers.
We propose a simple yet effective prompt stealing attack, PromptStealer.
- Score: 27.7826502104361
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-Image generation models have revolutionized the artwork design process and enabled anyone to create high-quality images by entering text descriptions called prompts. Creating a high-quality prompt that consists of a subject and several modifiers can be time-consuming and costly. In consequence, a trend of trading high-quality prompts on specialized marketplaces has emerged. In this paper, we perform the first study on understanding the threat of a novel attack, namely prompt stealing attack, which aims to steal prompts from generated images by text-to-image generation models. Successful prompt stealing attacks directly violate the intellectual property of prompt engineers and jeopardize the business model of prompt marketplaces. We first perform a systematic analysis on a dataset collected by ourselves and show that a successful prompt stealing attack should consider a prompt's subject as well as its modifiers. Based on this observation, we propose a simple yet effective prompt stealing attack, PromptStealer. It consists of two modules: a subject generator trained to infer the subject and a modifier detector for identifying the modifiers within the generated image. Experimental results demonstrate that PromptStealer is superior over three baseline methods, both quantitatively and qualitatively. We also make some initial attempts to defend PromptStealer. In general, our study uncovers a new attack vector within the ecosystem established by the popular text-to-image generation models. We hope our results can contribute to understanding and mitigating this emerging threat.
Related papers
- Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation [52.72682366640554]
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else.
It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author.
arXiv Detail & Related papers (2024-03-17T16:36:26Z) - Prompt Stealing Attacks Against Large Language Models [5.421974542780941]
We propose a novel attack against large language models (LLMs)
Our proposed prompt stealing attack aims to steal these well-designed prompts based on the generated answers.
Our experimental results show the remarkable performance of our proposed attacks.
arXiv Detail & Related papers (2024-02-20T12:25:26Z) - On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts [38.63253101205306]
Previous studies have successfully demonstrated that manipulated prompts can elicit text-to-image models to generate unsafe images.
We propose two poisoning attacks: a basic attack and a utility-preserving attack.
Our findings underscore the potential risks of adopting text-to-image models in real-world scenarios.
arXiv Detail & Related papers (2023-10-25T13:10:44Z) - SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution [21.93748586123046]
We develop and exhibit the first prompt attacks on Midjourney, resulting in the production of abundant NSFW images.
Our framework, SurrogatePrompt, systematically generates attack prompts, utilizing large language models, image-to-text, and image-to-image modules.
Results disclose an 88% success rate in bypassing Midjourney's proprietary safety filter with our attack prompts.
arXiv Detail & Related papers (2023-09-25T13:20:15Z) - BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest.
We demonstrate that this technology can be attacked to generate content that subtly manipulates its users.
We propose a Backdoor Attack on text-to-image Generative Models (BAGM)
Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z) - Effective Prompt Extraction from Language Models [70.00099540536382]
We present a framework for measuring the effectiveness of prompt extraction attacks.
In experiments with 3 different sources of prompts and 11 underlying large language models, we find that simple text-based attacks can in fact reveal prompts with high probability.
Our framework determines with high precision whether an extracted prompt is the actual secret prompt, rather than a model hallucination.
arXiv Detail & Related papers (2023-07-13T16:15:08Z) - Word-Level Explanations for Analyzing Bias in Text-to-Image Models [72.71184730702086]
Text-to-image (T2I) models can generate images that underrepresent minorities based on race and sex.
This paper investigates which word in the input prompt is responsible for bias in generated images.
arXiv Detail & Related papers (2023-06-03T21:39:07Z) - Two-in-One: A Model Hijacking Attack Against Text Generation Models [19.826236952700256]
We propose a new model hijacking attack, Ditto, that can hijack different text classification tasks into multiple generation ones.
Our results show that by using Ditto, an adversary can successfully hijack text generation models without jeopardizing their utility.
arXiv Detail & Related papers (2023-05-12T12:13:27Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image
Diffusion Models [103.61066310897928]
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt.
While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt.
We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt.
We introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness
arXiv Detail & Related papers (2023-01-31T18:10:38Z) - Rickrolling the Artist: Injecting Backdoors into Text Encoders for
Text-to-Image Synthesis [16.421253324649555]
We introduce backdoor attacks against text-guided generative models.
Our attacks only slightly alter an encoder so that no suspicious model behavior is apparent for image generations with clean prompts.
arXiv Detail & Related papers (2022-11-04T12:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.