Pr$εε$mpt: Sanitizing Sensitive Prompts for LLMs
- URL: http://arxiv.org/abs/2504.05147v1
- Date: Mon, 07 Apr 2025 14:52:40 GMT
- Title: Pr$εε$mpt: Sanitizing Sensitive Prompts for LLMs
- Authors: Amrita Roy Chowdhury, David Glukhov, Divyam Anshumaan, Prasad Chalasani, Nicolas Papernot, Somesh Jha, Mihir Bellare,
- Abstract summary: Pr$epsilonepsilon$mpt is a novel system that implements a prompt sanitizer.<n>We show that Pr$epsilonepsilon$mpt is a practical method to achieve meaningful privacy guarantees.
- Score: 49.84954577111077
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of large language models (LLMs) has introduced new privacy challenges, particularly during inference where sensitive information in prompts may be exposed to proprietary LLM APIs. In this paper, we address the problem of formally protecting the sensitive information contained in a prompt while maintaining response quality. To this end, first, we introduce a cryptographically inspired notion of a prompt sanitizer which transforms an input prompt to protect its sensitive tokens. Second, we propose Pr$\epsilon\epsilon$mpt, a novel system that implements a prompt sanitizer. Pr$\epsilon\epsilon$mpt categorizes sensitive tokens into two types: (1) those where the LLM's response depends solely on the format (such as SSNs, credit card numbers), for which we use format-preserving encryption (FPE); and (2) those where the response depends on specific values, (such as age, salary) for which we apply metric differential privacy (mDP). Our evaluation demonstrates that Pr$\epsilon\epsilon$mpt is a practical method to achieve meaningful privacy guarantees, while maintaining high utility compared to unsanitized prompts, and outperforming prior methods
Related papers
- CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks [47.62236306990252]
Large Language Models (LLMs) are susceptible to indirect prompt injection attacks.
This vulnerability stems from LLMs' inability to distinguish between data and instructions within a prompt.
We propose CachePrune that defends against this attack by identifying and pruning task-triggering neurons.
arXiv Detail & Related papers (2025-04-29T23:42:21Z) - Private Text Generation by Seeding Large Language Model Prompts [13.407214545457778]
We propose Differentially Private Keyphrase Prompt Seeding (DP-KPS), a method that generates a private synthetic text corpus from a sensitive input corpus.<n>We evaluate DP-KPS on downstream ML text classification tasks, and show that the corpora it generates preserve much of the predictive power of the original ones.
arXiv Detail & Related papers (2025-02-18T16:50:38Z) - ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs [72.13489820420726]
ProSA is a framework designed to evaluate and comprehend prompt sensitivity in large language models.
Our study uncovers that prompt sensitivity fluctuates across datasets and models, with larger models exhibiting enhanced robustness.
arXiv Detail & Related papers (2024-10-16T09:38:13Z) - Unlocking Memorization in Large Language Models with Dynamic Soft Prompting [66.54460367290146]
Large language models (LLMs) have revolutionized natural language processing (NLP) tasks such as summarization, question answering, and translation.
LLMs pose significant security risks due to their tendency to memorize training data, leading to potential privacy breaches and copyright infringement.
We propose a novel method for estimating LLM memorization using dynamic, prefix-dependent soft prompts.
arXiv Detail & Related papers (2024-09-20T18:56:32Z) - AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs [51.217126257318924]
We present a novel method that uses another Large Language Models, called the AdvPrompter, to generate human-readable adversarial prompts in seconds.
We train the AdvPrompter using a novel algorithm that does not require access to the gradients of the TargetLLM.
The trained AdvPrompter generates suffixes that veil the input instruction without changing its meaning, such that the TargetLLM is lured to give a harmful response.
arXiv Detail & Related papers (2024-04-21T22:18:13Z) - PRSA: PRompt Stealing Attacks against Large Language Models [42.07328505384544]
"prompt as a service" has greatly enhanced the utility of large language models (LLMs)
We introduce a novel attack framework, PRSA, designed for prompt stealing attacks against LLMs.
PRSA mainly consists of two key phases: prompt mutation and prompt pruning.
arXiv Detail & Related papers (2024-02-29T14:30:28Z) - Intention Analysis Makes LLMs A Good Jailbreak Defender [79.4014719271075]
We present a simple yet highly effective defense strategy, i.e., Intention Analysis ($mathbbIA$)<n>$mathbbIA$ works by triggering LLMs' inherent self-correct and improve ability through a two-stage process.<n>Experiments on varying jailbreak benchmarks show that $mathbbIA$ could consistently and significantly reduce the harmfulness in responses.
arXiv Detail & Related papers (2024-01-12T13:15:05Z) - ConfusionPrompt: Practical Private Inference for Online Large Language Models [3.8134804426693094]
State-of-the-art large language models (LLMs) are typically deployed as online services, requiring users to transmit detailed prompts to cloud servers.
We introduce ConfusionPrompt, a novel framework for private LLM inference that protects user privacy by decomposing the original prompt into smaller sub-prompts.
We show that ConfusionPrompt achieves significantly higher utility than local inference methods using open-source models and perturbation-based techniques.
arXiv Detail & Related papers (2023-12-30T01:26:42Z) - Certifying LLM Safety against Adversarial Prompting [70.96868018621167]
Large language models (LLMs) are vulnerable to adversarial attacks that add malicious tokens to an input prompt.<n>We introduce erase-and-check, the first framework for defending against adversarial prompts with certifiable safety guarantees.
arXiv Detail & Related papers (2023-09-06T04:37:20Z) - Flocks of Stochastic Parrots: Differentially Private Prompt Learning for
Large Language Models [26.969641494649267]
We instantiate a simple but highly effective membership inference attack against the data used to prompt large language models.
We show that our prompt-based approach is easily deployed with existing commercial APIs.
arXiv Detail & Related papers (2023-05-24T22:06:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.