Reinforcement Learning-Based Prompt Template Stealing for Text-to-Image Models
- URL: http://arxiv.org/abs/2510.00046v1
- Date: Sat, 27 Sep 2025 12:29:50 GMT
- Title: Reinforcement Learning-Based Prompt Template Stealing for Text-to-Image Models
- Authors: Xiaotian Zou,
- Abstract summary: We present RLStealer, a reinforcement learning framework that recovers its template from only a small set of example images.<n> RLStealer gets state-of-the-art performance while reducing the total attack cost to under 13% of that required by existing baselines.<n>Our study highlights an urgent security threat inherent in prompt trading and lays the groundwork for developing protective standards.
- Score: 0.913755431537592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal Large Language Models (MLLMs) have transformed text-to-image workflows, allowing designers to create novel visual concepts with unprecedented speed. This progress has given rise to a thriving prompt trading market, where curated prompts that induce trademark styles are bought and sold. Although commercially attractive, prompt trading also introduces a largely unexamined security risk: the prompts themselves can be stolen. In this paper, we expose this vulnerability and present RLStealer, a reinforcement learning based prompt inversion framework that recovers its template from only a small set of example images. RLStealer treats template stealing as a sequential decision making problem and employs multiple similarity based feedback signals as reward functions to effectively explore the prompt space. Comprehensive experiments on publicly available benchmarks demonstrate that RLStealer gets state-of-the-art performance while reducing the total attack cost to under 13% of that required by existing baselines. Our further analysis confirms that RLStealer can effectively generalize across different image styles to efficiently steal unseen prompt templates. Our study highlights an urgent security threat inherent in prompt trading and lays the groundwork for developing protective standards in the emerging MLLMs marketplace.
Related papers
- AMCR: A Framework for Assessing and Mitigating Copyright Risks in Generative Models [14.928831547948326]
This paper introduces Assessing and Mitigating Copyright Risks (AMCR)<n>AMCR builds upon prompt-based strategies by systematically restructuring risky prompts into safe and non-sensitive forms.<n>Experiments validate the effectiveness of AMCR in revealing and mitigating latent copyright risks.
arXiv Detail & Related papers (2025-08-31T00:00:03Z) - SoK: Large Language Model Copyright Auditing via Fingerprinting [69.14570598973195]
We introduce a unified framework and formal taxonomy that categorizes existing methods into white-box and black-box approaches.<n>We propose LeaFBench, the first systematic benchmark for evaluating LLM fingerprinting under realistic deployment scenarios.
arXiv Detail & Related papers (2025-08-27T12:56:57Z) - Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models [20.99874786089634]
Previous jailbreak attacks often inject malicious instructions from text into less aligned modalities, such as vision.<n>We propose a novel implicit jailbreak framework termed IJA that stealthily embeds malicious instructions into images via at least significant bit steganography.<n>On commercial models like GPT-4o and Gemini-1.5 Pro, our method achieves attack success rates of over 90% using an average of only 3 queries.
arXiv Detail & Related papers (2025-05-22T09:34:47Z) - MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks [85.3303135160762]
MIRAGE is a novel framework that exploits narrative-driven context and role immersion to circumvent safety mechanisms in Multimodal Large Language Models.<n>It achieves state-of-the-art performance, improving attack success rates by up to 17.5% over the best baselines.<n>We demonstrate that role immersion and structured semantic reconstruction can activate inherent model biases, facilitating the model's spontaneous violation of ethical safeguards.
arXiv Detail & Related papers (2025-03-24T20:38:42Z) - CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models [58.58208005178676]
We propose CopyJudge, a novel automated infringement identification framework.<n>We employ an abstraction-filtration-comparison test framework to assess the likelihood of infringement.<n>We introduce a general LVLM-based mitigation strategy that automatically optimize infringing prompts.
arXiv Detail & Related papers (2025-02-21T08:09:07Z) - Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach [16.619255714170222]
We introduce Prism, a benchmark consisting of 50 templates and 450 images, organized into Easy and Hard difficulty levels.<n>We propose EvoStealer, a novel template stealing method that operates without model fine-tuning.<n>Our evaluation shows that EvoStealer's stolen templates can reproduce images highly similar to originals and effectively generalize to other subjects.
arXiv Detail & Related papers (2025-02-20T05:52:10Z) - White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models.
Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input.
An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z) - MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models [41.708401515627784]
We observe that Multimodal Large Language Models (MLLMs) can be easily compromised by query-relevant images.
We introduce MM-SafetyBench, a framework designed for conducting safety-critical evaluations of MLLMs against such image-based manipulations.
Our work underscores the need for a concerted effort to strengthen and enhance the safety measures of open-source MLLMs against potential malicious exploits.
arXiv Detail & Related papers (2023-11-29T12:49:45Z) - Adversarial Prompt Tuning for Vision-Language Models [86.5543597406173]
Adversarial Prompt Tuning (AdvPT) is a technique to enhance the adversarial robustness of image encoders in Vision-Language Models (VLMs)
We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques.
arXiv Detail & Related papers (2023-11-19T07:47:43Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - Prompt Stealing Attacks Against Text-to-Image Generation Models [27.7826502104361]
A trend of trading high-quality prompts on specialized marketplaces has emerged.
Successful prompt stealing attacks directly violate the intellectual property of prompt engineers.
We propose a simple yet effective prompt stealing attack, PromptStealer.
arXiv Detail & Related papers (2023-02-20T11:37:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.