EmojiPrompt: Generative Prompt Obfuscation for Privacy-Preserving Communication with Cloud-based LLMs
- URL: http://arxiv.org/abs/2402.05868v3
- Date: Thu, 20 Mar 2025 20:15:22 GMT
- Title: EmojiPrompt: Generative Prompt Obfuscation for Privacy-Preserving Communication with Cloud-based LLMs
- Authors: Sam Lin, Wenyue Hua, Zhenting Wang, Mingyu Jin, Lizhou Fan, Yongfeng Zhang,
- Abstract summary: EmojiPrompt performs generative transformation, obfuscating private data within prompts with linguistic and non-linguistic elements.<n>We evaluate EmojiPrompt's performance across 8 datasets from various domains.<n>EmojiPrompt's atomic-level obfuscation allows it to function exclusively with cloud-based LLMs.
- Score: 34.77734655124251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cloud-based Large Language Models (LLMs) such as ChatGPT have become increasingly integral to daily operations. Nevertheless, they also introduce privacy concerns: firstly, numerous studies underscore the risks to user privacy posed by jailbreaking cloud-based LLMs; secondly, the LLM service providers have access to all user data, which deters individuals from confidently utilizing such services. To address such concerns, we propose a simple yet effective paradigm, EmojiPrompt, to protect user privacy. At its core, EmojiPrompt performs generative transformation, obfuscating private data within prompts with linguistic and non-linguistic elements before submitting them to cloud-based LLMs. We evaluate EmojiPrompt's performance across 8 datasets from various domains. We also propose simulated inference attacks to assess EmojiPrompt's ability to preserve user privacy. The results demonstrate that EmojiPrompt effectively obfuscates user private data, while largely maintaining, or even enhancing, performances compared to the unobfuscated version. Furthermore, EmojiPrompt's atomic-level obfuscation allows it to function exclusively with cloud-based LLMs. For source code, please refer to: https://github.com/agiresearch/EmojiCrypt.
Related papers
- A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation [0.6699777383856287]
ChatGPT services leverage cloud-based large language models (LLMs)
Privacy concerns arise as prompts are transmitted and processed by the model providers.
We propose a general pseudonymization framework applicable to cloud-based LLMs.
arXiv Detail & Related papers (2025-02-21T06:15:53Z) - PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models [10.050972891318324]
We propose a privacy preservation pipeline for protecting privacy and sensitive information during interactions between users and large language models.
We construct SensitiveQA, the first privacy open-ended question-answering dataset.
Our proposed solution employs a multi-stage strategy aimed at preemptively securing user information while simultaneously preserving the response quality of cloud-based LLMs.
arXiv Detail & Related papers (2025-02-19T09:17:07Z) - Confidential Prompting: Protecting User Prompts from Cloud LLM Providers [0.688204255655161]
We introduce Secure Multi-party Decoding (SMD) to confine user prompts to a trusted execution environment.
We also introduce a novel cryptographic method, Prompt Obfuscation (PO) to ensure robustness against reconstruction attacks.
Our solution can enable privacy-preserving cloud LLM services that handle sensitive prompts, such as clinical records, financial data, and personal information.
arXiv Detail & Related papers (2024-09-27T20:32:42Z) - Ciphertext-Only Attack on a Secure $k$-NN Computation on Cloud [0.0]
encryption can prevent unauthorized access, data breaches, and the resultant financial loss, reputation damage, and legal issues.
Sanyashi et al. proposed an encryption scheme to facilitate privacy-preserving $k$-NN computation on the cloud.
We give an efficient algorithm and empirically demonstrate that their encryption scheme is vulnerable to the ciphertext-only attack (COA)
arXiv Detail & Related papers (2024-03-14T03:53:01Z) - Differentially Private Synthetic Data via Foundation Model APIs 2: Text [56.13240830670327]
A lot of high-quality text data generated in the real world is private and cannot be shared or used freely due to privacy concerns.
We propose an augmented PE algorithm, named Aug-PE, that applies to the complex setting of text.
Our results demonstrate that Aug-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines.
arXiv Detail & Related papers (2024-03-04T05:57:50Z) - CodeChameleon: Personalized Encryption Framework for Jailbreaking Large
Language Models [49.60006012946767]
We propose CodeChameleon, a novel jailbreak framework based on personalized encryption tactics.
We conduct extensive experiments on 7 Large Language Models, achieving state-of-the-art average Attack Success Rate (ASR)
Remarkably, our method achieves an 86.6% ASR on GPT-4-1106.
arXiv Detail & Related papers (2024-02-26T16:35:59Z) - dabih -- encrypted data storage and sharing platform [0.0]
dabih is an open-source web application designed to facilitate user-friendly encrypted data management.
Its approach to data security involves a two-stage envelope encryption process.
The private key necessary for decrypting the data remains exclusively on the owner's device.
arXiv Detail & Related papers (2024-01-16T12:57:35Z) - ConfusionPrompt: Practical Private Inference for Online Large Language Models [3.8134804426693094]
State-of-the-art large language models (LLMs) are typically deployed as online services, requiring users to transmit detailed prompts to cloud servers.
We introduce ConfusionPrompt, a novel framework for private LLM inference that protects user privacy by decomposing the original prompt into smaller sub-prompts.
We show that ConfusionPrompt achieves significantly higher utility than local inference methods using open-source models and perturbation-based techniques.
arXiv Detail & Related papers (2023-12-30T01:26:42Z) - Simple client-side encryption of personal information with Web Assembly [0.0]
A simple method is proposed to encrypt the data on the client side, using Web Assembly.
The method has been developed for a semantic medical database, and allows accessing personal data using an additional password.
arXiv Detail & Related papers (2023-12-29T17:10:57Z) - Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models [63.91178922306669]
We introduce Silent Guardian, a text protection mechanism against large language models (LLMs)
By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction.
We show that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases.
arXiv Detail & Related papers (2023-12-15T10:30:36Z) - DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer [57.04801796205638]
Large Language Models (LLMs) have emerged as dominant tools for various tasks.
However, concerns surrounding data privacy present obstacles due to the tuned prompts' dependency on sensitive private information.
We present Differentially-Private Offsite Prompt Tuning (DP-OPT) to address this challenge.
arXiv Detail & Related papers (2023-11-27T02:01:10Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy
Protection [6.201275002179716]
We introduce the HaS framework, where "H(ide)" and "S(eek)" represent its two core processes: hiding private entities for anonymization and seeking private entities for de-anonymization.
To quantitatively assess HaS's privacy protection performance, we propose both black-box and white-box adversarial models.
arXiv Detail & Related papers (2023-09-06T14:54:11Z) - GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher [85.18213923151717]
Experimental results show certain ciphers succeed almost 100% of the time to bypass the safety alignment of GPT-4 in several safety domains.
We propose a novel SelfCipher that uses only role play and several demonstrations in natural language to evoke this capability.
arXiv Detail & Related papers (2023-08-12T04:05:57Z) - Privacy Implications of Retrieval-Based Language Models [26.87950501433784]
We present the first study of privacy risks in retrieval-based LMs, particularly $k$NN-LMs.
We find that $k$NN-LMs are more susceptible to leaking private information from their private datastore than parametric models.
arXiv Detail & Related papers (2023-05-24T08:37:27Z) - THE-X: Privacy-Preserving Transformer Inference with Homomorphic
Encryption [112.02441503951297]
Privacy-preserving inference of transformer models is on the demand of cloud service users.
We introduce $textitTHE-X$, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models.
arXiv Detail & Related papers (2022-06-01T03:49:18Z) - Reinforcement Learning on Encrypted Data [58.39270571778521]
We present a preliminary, experimental study of how a DQN agent trained on encrypted states performs in environments with discrete and continuous state spaces.
Our results highlight that the agent is still capable of learning in small state spaces even in presence of non-deterministic encryption, but performance collapses in more complex environments.
arXiv Detail & Related papers (2021-09-16T21:59:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.