Related papers: Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models

Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models

URL: http://arxiv.org/abs/2508.06837v1
Date: Sat, 09 Aug 2025 05:38:38 GMT
Title: Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models
Authors: Shiqian Zhao, Chong Wang, Yiming Li, Yihao Huang, Wenjie Qu, Siew-Kei Lam, Yi Xie, Kangjie Chen, Jie Zhang, Tianwei Zhang,
Abstract summary: Prometheus is a training-free, proxy-in-the-loop, search-based prompt-stealing attack.<n>It reverse-engineers the valuable prompts of showcases by interacting with a local proxy model.<n>Prometheus successfully extracts prompts from popular platforms like PromptBase and AIFrog.
Score: 31.159162126762975
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-Image (T2I) models, represented by DALL$\cdot$E and Midjourney, have gained huge popularity for creating realistic images. The quality of these images relies on the carefully engineered prompts, which have become valuable intellectual property. While skilled prompters showcase their AI-generated art on markets to attract buyers, this business incidentally exposes them to \textit{prompt stealing attacks}. Existing state-of-the-art attack techniques reconstruct the prompts from a fixed set of modifiers (i.e., style descriptions) with model-specific training, which exhibit restricted adaptability and effectiveness to diverse showcases (i.e., target images) and diffusion models. To alleviate these limitations, we propose Prometheus, a training-free, proxy-in-the-loop, search-based prompt-stealing attack, which reverse-engineers the valuable prompts of the showcases by interacting with a local proxy model. It consists of three innovative designs. First, we introduce dynamic modifiers, as a supplement to static modifiers used in prior works. These dynamic modifiers provide more details specific to the showcases, and we exploit NLP analysis to generate them on the fly. Second, we design a contextual matching algorithm to sort both dynamic and static modifiers. This offline process helps reduce the search space of the subsequent step. Third, we interact with a local proxy model to invert the prompts with a greedy search algorithm. Based on the feedback guidance, we refine the prompt to achieve higher fidelity. The evaluation results show that Prometheus successfully extracts prompts from popular platforms like PromptBase and AIFrog against diverse victim models, including Midjourney, Leonardo.ai, and DALL$\cdot$E, with an ASR improvement of 25.0\%. We also validate that Prometheus is resistant to extensive potential defenses, further highlighting its severity in practice.

Related papers

VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language [25.38940067963429]
Prior attacks on text-to-video (T2V) models typically add adversarial perturbations to obviously unsafe prompts.<n>We show that benign-looking prompts containing rich, implicit cues can induce T2V models to generate semantically unsafe videos.<n>We propose VEIL, a jailbreak framework that leverages T2V models' cross-modal associative patterns via a modular prompt design.
arXiv Detail & Related papers (2025-11-17T08:31:43Z)
PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting [31.35160142315478]
We introduce PromptEnhancer, a novel and universal prompt rewriting framework for text-to-image (T2I) models.<n>Unlike prior methods that rely on model-specific fine-tuning or implicit reward signals like image-reward scores, our framework decouples the rewriter from the generator.<n>Experiments on the HunyuanImage 2.1 model demonstrate that PromptEnhancer significantly improves image-text alignment across a wide range of semantic and compositional challenges.
arXiv Detail & Related papers (2025-09-04T16:46:10Z)
PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting [25.24109316946351]
We propose PromptFlare, a novel adversarial protection method designed to protect images from malicious modifications facilitated by diffusion-based inpainting models.<n>Our approach exploits the intrinsic properties of prompt embeddings and injects adversarial noise to suppress the sampling process.<n>Experiments on the EditBench dataset demonstrate that our method achieves state-of-the-art performance across various metrics.
arXiv Detail & Related papers (2025-08-22T08:42:46Z)
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning [88.14234949860105]
RePrompt is a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning.<n>Our approach enables end-to-end training without human-annotated data.
arXiv Detail & Related papers (2025-05-23T06:44:26Z)
HTS-Attack: Heuristic Token Search for Jailbreaking Text-to-Image Models [28.28898114141277]
Text-to-Image(T2I) models have achieved remarkable success in image generation and editing.<n>These models still have many potential issues, particularly in generating inappropriate or Not-Safe-For-Work(NSFW) content.<n>We propose HTS-Attack, a token search attack method.
arXiv Detail & Related papers (2024-08-25T17:33:40Z)
An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape [11.45988746286973]
Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. We study 8 state-of-the-art detectors and argue that they are far from being ready for deployment.
arXiv Detail & Related papers (2024-04-24T21:21:50Z)
Dynamic Prompt Optimizing for Text-to-Image Generation [63.775458908172176]
We introduce the textbfPrompt textbfAuto-textbfEditing (PAE) method to improve text-to-image generative models. We employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts.
arXiv Detail & Related papers (2024-04-05T13:44:39Z)
Unsegment Anything by Simulating Deformation [67.10966838805132]
"Anything Unsegmentable" is a task to grant any image "the right to be unsegmented" We aim to achieve transferable adversarial attacks against all prompt-based segmentation models. Our approach focuses on disrupting image encoder features to achieve prompt-agnostic attacks.
arXiv Detail & Related papers (2024-04-03T09:09:42Z)
Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks. We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception. Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z)
Adversarial Prompt Tuning for Vision-Language Models [86.5543597406173]
Adversarial Prompt Tuning (AdvPT) is a technique to enhance the adversarial robustness of image encoders in Vision-Language Models (VLMs) We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques.
arXiv Detail & Related papers (2023-11-19T07:47:43Z)
Prompt Stealing Attacks Against Text-to-Image Generation Models [27.7826502104361]
A trend of trading high-quality prompts on specialized marketplaces has emerged. Successful prompt stealing attacks directly violate the intellectual property of prompt engineers. We propose a simple yet effective prompt stealing attack, PromptStealer.
arXiv Detail & Related papers (2023-02-20T11:37:28Z)
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models [103.61066310897928]
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt. We introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness
arXiv Detail & Related papers (2023-01-31T18:10:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.