Effective Prompt Extraction from Language Models
- URL: http://arxiv.org/abs/2307.06865v2
- Date: Sat, 17 Feb 2024 23:44:05 GMT
- Title: Effective Prompt Extraction from Language Models
- Authors: Yiming Zhang and Nicholas Carlini and Daphne Ippolito
- Abstract summary: We present a framework for measuring the effectiveness of prompt extraction attacks.
In experiments with 3 different sources of prompts and 11 underlying large language models, we find that simple text-based attacks can in fact reveal prompts with high probability.
Our framework determines with high precision whether an extracted prompt is the actual secret prompt, rather than a model hallucination.
- Score: 78.67410369494623
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The text generated by large language models is commonly controlled by
prompting, where a prompt prepended to a user's query guides the model's
output. The prompts used by companies to guide their models are often treated
as secrets, to be hidden from the user making the query. They have even been
treated as commodities to be bought and sold. However, anecdotal reports have
shown adversarial users employing prompt extraction attacks to recover these
prompts. In this paper, we present a framework for systematically measuring the
effectiveness of these attacks. In experiments with 3 different sources of
prompts and 11 underlying large language models, we find that simple text-based
attacks can in fact reveal prompts with high probability. Our framework
determines with high precision whether an extracted prompt is the actual secret
prompt, rather than a model hallucination. Prompt extraction experiments on
real systems such as Bing Chat and ChatGPT suggest that system prompts can be
revealed by an adversary despite existing defenses in place.
Related papers
- Plug and Play with Prompts: A Prompt Tuning Approach for Controlling Text Generation [16.49758711633611]
Large Language Models (LLMs) have shown exceptional language generation capabilities in response to text-based prompts.
In this work, we explore the use of Prompt Tuning to achieve controlled language generation.
We demonstrate the efficacy of our method towards mitigating harmful, toxic, and biased text generated by language models.
arXiv Detail & Related papers (2024-04-08T01:54:28Z) - DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM
Jailbreakers [80.18953043605696]
We introduce an automatic prompt textbfDecomposition and textbfReconstruction framework for jailbreak textbfAttack (DrAttack)
DrAttack includes three key components: (a) Decomposition' of the original prompt into sub-prompts, (b) Reconstruction' of these sub-prompts implicitly by in-context learning with semantically similar but harmless reassembling demo, and (c) a Synonym Search' of sub-prompts, aiming to find sub-prompts' synonyms that maintain the original intent while
arXiv Detail & Related papers (2024-02-25T17:43:29Z) - Prompt Stealing Attacks Against Large Language Models [5.421974542780941]
We propose a novel attack against large language models (LLMs)
Our proposed prompt stealing attack aims to steal these well-designed prompts based on the generated answers.
Our experimental results show the remarkable performance of our proposed attacks.
arXiv Detail & Related papers (2024-02-20T12:25:26Z) - Can discrete information extraction prompts generalize across language
models? [36.85568212975316]
We study whether automatically-induced prompts can also be used, out-of-the-box, to probe other language models for the same information.
We introduce a way to induce prompts by mixing language models at training time that results in prompts that generalize well across models.
arXiv Detail & Related papers (2023-02-20T09:56:51Z) - Prompting Large Language Model for Machine Translation: A Case Study [87.88120385000666]
We offer a systematic study on prompting strategies for machine translation.
We examine factors for prompt template and demonstration example selection.
We explore the use of monolingual data and the feasibility of cross-lingual, cross-domain, and sentence-to-document transfer learning.
arXiv Detail & Related papers (2023-01-17T18:32:06Z) - Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good
movie, and a good prompt too? [84.91689960190054]
Large language models can perform new tasks in a zero-shot fashion, given natural language prompts.
It is underexplored what factors make the prompts effective, especially when the prompts are natural language.
arXiv Detail & Related papers (2022-12-20T18:47:13Z) - Optimizing Prompts for Text-to-Image Generation [97.61295501273288]
Well-designed prompts can guide text-to-image models to generate amazing images.
But the performant prompts are often model-specific and misaligned with user input.
We propose prompt adaptation, a framework that automatically adapts original user input to model-preferred prompts.
arXiv Detail & Related papers (2022-12-19T16:50:41Z) - Demystifying Prompts in Language Models via Perplexity Estimation [100.43627541756524]
Performance of a prompt is coupled with the extent to which the model is familiar with the language it contains.
We show that the lower the perplexity of the prompt is, the better the prompt is able to perform the task.
arXiv Detail & Related papers (2022-12-08T02:21:47Z) - Ignore Previous Prompt: Attack Techniques For Language Models [0.0]
We propose PromptInject, a framework for mask-based adversarial prompt composition.
We show how GPT-3, the most widely deployed language model in production, can be easily misaligned by simple handcrafted inputs.
arXiv Detail & Related papers (2022-11-17T13:43:20Z) - Do Prompts Solve NLP Tasks Using Natural Language? [18.611748762251494]
In this work, we empirically compare the three types of prompts under both few-shot and fully-supervised settings.
Our experimental results show that schema prompts are the most effective in general.
arXiv Detail & Related papers (2022-03-02T07:20:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.