Fugu-MT 論文翻訳(概要): Protecting User Prompts Via Character-Level Differential Privacy

論文の概要: Protecting User Prompts Via Character-Level Differential Privacy

arxiv url: http://arxiv.org/abs/2603.26032v1
Date: Fri, 27 Mar 2026 03:02:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-30 21:49:48.336669
Title: Protecting User Prompts Via Character-Level Differential Privacy
Title（参考訳）: 文字レベル差分プライバシーによるユーザプロンプトの保護
Authors: Shashie Dilhara Batan Arachchige, Hassan Jameel Asghar, Benjamin Zi Hao Zhao, Dinusha Vatsalan, Dali Kaafar,
Abstract要約: ユーザプロンプトを衛生化するための新しい手法を提案する。我々のメカニズムは、単語中の各文字をランダムかつ独立に摂動させるために、差分プライバシーのランダム化応答機構を使用する。修復によって、文脈からの手がかりや、これらの単語がしばしば非常に一般的であるという事実によって、不感な単語が摂動した場合でも、再構築することができる。
参考スコア（独自算出の注目度）: 2.986027976506785
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) generate responses based on user prompts. Often, these prompts may contain highly sensitive information, including personally identifiable information (PII), which could be exposed to third parties hosting these models. In this work, we propose a new method to sanitize user prompts. Our mechanism uses the randomized response mechanism of differential privacy to randomly and independently perturb each character in a word. The perturbed text is then sent to a remote LLM, which first performs a prompt restoration and subsequently performs the intended downstream task. The idea is that the restoration will be able to reconstruct non-sensitive words even when they are perturbed due to cues from the context, as well as the fact that these words are often very common. On the other hand, perturbation would make reconstruction of sensitive words difficult because they are rare. We experimentally validate our method on two datasets, i2b2/UTHealth and Enron, using two LLMs: Llama-3.1 8B Instruct and GPT-4o mini. We also compare our approach with a word-level differentially private mechanism, and with a rule-based PII redaction baseline, using a unified privacy-utility evaluation. Our results show that sensitive PII tagged in these datasets are reconstructed at a rate close to the theoretical rate of reconstructing completely random words, whereas non-sensitive words are reconstructed at a much higher rate. Our method has the advantage that it can be applied without explicitly identifying sensitive pieces of information in the prompt, while showing a good privacy-utility tradeoff for downstream tasks.
Abstract（参考訳）: 大規模言語モデル(LLM)は、ユーザのプロンプトに基づいて応答を生成する。多くの場合、これらのプロンプトには、個人識別可能な情報(PII)を含む、非常に機密性の高い情報が含まれている。本研究では,ユーザプロンプトの衛生化のための新しい手法を提案する。我々のメカニズムは、単語中の各文字をランダムかつ独立に摂動させるために、差分プライバシーのランダム化応答機構を使用する。乱れたテキストはリモートLLMに送信され、最初にプロンプト復元を行い、次に意図した下流タスクを実行する。その考え方は、この修復によって、文脈から遠ざかっているときでも、その言葉がしばしば一般的であるという事実によって、非感受性な単語を再構築できるというものである。一方、摂動は、稀なため、センシティブな単語の再構築を難しくする。 Llama-3.1 8B Instruct と GPT-4o mini の2つの LLM を用いて,i2b2/UTHealth と Enron の2つのデータセット上で実験を行った。また、我々のアプローチを、単語レベルでの差分的プライベートなメカニズム、およびルールベースのPIIリアクションベースラインと比較し、統一されたプライバシーユーティリティー評価を用いた。これらのデータセットにタグ付けされたセンシティブなPIIは、完全にランダムな単語を再構成する理論的な速度に近い速度で再構成されるのに対し、非センシティブな単語はより高速に再構成される。提案手法は,ダウンストリームタスクに対して,適切なプライバシ・ユーティリティ・トレードオフを示すと同時に,インプロンプト内の機密情報を明示的に特定することなく適用できるという利点がある。

論文の概要: Protecting User Prompts Via Character-Level Differential Privacy

関連論文リスト