Related papers: Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach

Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach

URL: http://arxiv.org/abs/2502.14285v2
Date: Sat, 17 May 2025 10:50:48 GMT
Title: Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach
Authors: Yurong Wu, Fangwen Mu, Qiuhong Zhang, Jinjing Zhao, Xinrun Xu, Lingrui Mei, Yang Wu, Lin Shi, Junjie Wang, Zhiming Ding, Yiwei Wang,
Abstract summary: We introduce Prism, a benchmark consisting of 50 templates and 450 images, organized into Easy and Hard difficulty levels.<n>We propose EvoStealer, a novel template stealing method that operates without model fine-tuning.<n>Our evaluation shows that EvoStealer's stolen templates can reproduce images highly similar to originals and effectively generalize to other subjects.
Score: 16.619255714170222
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prompt trading has emerged as a significant intellectual property concern in recent years, where vendors entice users by showcasing sample images before selling prompt templates that can generate similar images. This work investigates a critical security vulnerability: attackers can steal prompt templates using only a limited number of sample images. To investigate this threat, we introduce Prism, a prompt-stealing benchmark consisting of 50 templates and 450 images, organized into Easy and Hard difficulty levels. To identify the vulnerabity of VLMs to prompt stealing, we propose EvoStealer, a novel template stealing method that operates without model fine-tuning by leveraging differential evolution algorithms. The system first initializes population sets using multimodal large language models (MLLMs) based on predefined patterns, then iteratively generates enhanced offspring through MLLMs. During evolution, EvoStealer identifies common features across offspring to derive generalized templates. Our comprehensive evaluation conducted across open-source (INTERNVL2-26B) and closed-source models (GPT-4o and GPT-4o-mini) demonstrates that EvoStealer's stolen templates can reproduce images highly similar to originals and effectively generalize to other subjects, significantly outperforming baseline methods with an average improvement of over 10%. Moreover, our cost analysis reveals that EvoStealer achieves template stealing with negligible computational expenses. Our code and dataset are available at https://github.com/whitepagewu/evostealer.

Related papers

Reinforcement Learning-Based Prompt Template Stealing for Text-to-Image Models [0.913755431537592]
We present RLStealer, a reinforcement learning framework that recovers its template from only a small set of example images.<n> RLStealer gets state-of-the-art performance while reducing the total attack cost to under 13% of that required by existing baselines.<n>Our study highlights an urgent security threat inherent in prompt trading and lays the groundwork for developing protective standards.
arXiv Detail & Related papers (2025-09-27T12:29:50Z)
Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models [20.99874786089634]
Previous jailbreak attacks often inject malicious instructions from text into less aligned modalities, such as vision.<n>We propose a novel implicit jailbreak framework termed IJA that stealthily embeds malicious instructions into images via at least significant bit steganography.<n>On commercial models like GPT-4o and Gemini-1.5 Pro, our method achieves attack success rates of over 90% using an average of only 3 queries.
arXiv Detail & Related papers (2025-05-22T09:34:47Z)
No Query, No Access [50.18709429731724]
We introduce the textbfVictim Data-based Adrial Attack (VDBA), which operates using only victim texts.<n>To prevent access to the victim model, we create a shadow dataset with publicly available pre-trained models and clustering methods.<n>Experiments on the Emotion and SST5 datasets show that VDBA outperforms state-of-the-art methods, achieving an ASR improvement of 52.08%.
arXiv Detail & Related papers (2025-05-12T06:19:59Z)
Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content [53.93606081932928]
We introduce a novel black box detection framework that requires only API access.<n>We measure the likelihood that the image was generated by the model itself.<n>For black-box models that do not support masked image inputs, we incorporate a cost efficient surrogate model trained to align with the target model distribution.
arXiv Detail & Related papers (2025-05-02T05:11:35Z)
AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection [26.066755429896926]
Methods select Out-of-Distribution (OoD) data as backdoor watermarks and retrain the original model for copyright protection. Existing methods are susceptible to malicious detection and forgery by adversaries, resulting in watermark evasion. We propose Model-underlineagnostic Black-box Backdoor Wunderlineatermarking Framework (AGATE) to address stealthiness and robustness challenges in multimodal model copyright protection.
arXiv Detail & Related papers (2025-04-28T14:52:01Z)
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images [9.351260848685229]
Large vision-language models (LVLMs) have demonstrated remarkable image understanding and dialogue capabilities. Their widespread availability raises concerns about unauthorized usage and copyright infringement. We propose a novel method called Learning Attack (PLA) for tracking the copyright of LVLMs without modifying the original model.
arXiv Detail & Related papers (2025-02-23T14:49:34Z)
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework. Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss. We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z)
Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training [51.87027943520492]
We present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities. Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities.
arXiv Detail & Related papers (2024-06-10T06:26:03Z)
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple but effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder. DEEM exhibits enhanced robustness and a superior capacity to alleviate model hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z)
AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA [67.68750063537482]
Diffusion models have achieved remarkable success in generating high-quality images. Recent works aim to let SD models output watermarked content for post-hoc forensics. We propose textttmethod as the first implementation under this scenario.
arXiv Detail & Related papers (2024-05-18T01:25:47Z)
Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative Privacy Risk [60.36852134501251]
We reveal a new privacy risk, Shake-to-Leak (S2L), that fine-tuning the pre-trained models with manipulated data can amplify the existing privacy risks. In the worst case, S2L can amplify the state-of-the-art membership inference attack (MIA) on diffusion models by $5.4%$ AUC. This discovery underscores that the privacy risk with diffusion models is even more severe than previously recognized.
arXiv Detail & Related papers (2024-03-14T14:48:37Z)
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models [41.708401515627784]
We observe that Multimodal Large Language Models (MLLMs) can be easily compromised by query-relevant images. We introduce MM-SafetyBench, a framework designed for conducting safety-critical evaluations of MLLMs against such image-based manipulations. Our work underscores the need for a concerted effort to strengthen and enhance the safety measures of open-source MLLMs against potential malicious exploits.
arXiv Detail & Related papers (2023-11-29T12:49:45Z)
TARGET: Template-Transferable Backdoor Attack Against Prompt-based NLP Models via GPT4 [15.015584291919817]
We propose a novel approach of TARGET (Template-trAnsfeRable backdoor attack aGainst prompt-basEd NLP models via GPT4) Specifically, we first utilize GPT4 to reformulate manual templates to generate tone-strong and normal templates, and the former are injected into the model as a backdoor trigger in the pre-training phase. Then, we not only directly employ the above templates in the downstream task, but also use GPT4 to generate templates with similar tone to the above templates to carry out transferable attacks.
arXiv Detail & Related papers (2023-11-29T08:12:09Z)
Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service [19.916419258812077]
We propose a robust embedding watermarking method for languages called Marker. To enhance the watermark, we propose a collaborative copyright verification strategy based on both backdoor trigger and embedding distribution.
arXiv Detail & Related papers (2023-11-10T04:27:27Z)
Decision-based iterative fragile watermarking for model integrity verification [33.42076236847454]
Foundation models are typically hosted on cloud servers to meet the high demand for their services. This exposes them to security risks, as attackers can modify them after uploading to the cloud or transferring from a local system. We propose an iterative decision-based fragile watermarking algorithm that transforms normal training samples into fragile samples that are sensitive to model changes.
arXiv Detail & Related papers (2023-05-13T10:36:11Z)
Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint. We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC) SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z)
DeepHider: A Multi-module and Invisibility Watermarking Scheme for Language Model [0.0]
This paper proposes a new threat of replacing the model classification module and performing global fine-tuning of the model. We use the properties of blockchain such as tamper-proof and traceability to prevent the ownership statement of thieves. Experiments show that the proposed scheme successfully verifies ownership with 100% watermark verification accuracy.
arXiv Detail & Related papers (2022-08-09T11:53:24Z)
Defending against Model Stealing via Verifying Embedded External Features [90.29429679125508]
adversaries can steal' deployed models even when they have no training samples and can not get access to the model parameters or structures. We explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified emphexternal features. Our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process.
arXiv Detail & Related papers (2021-12-07T03:51:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.