Tight and Practical Privacy Auditing for Differentially Private In-Context Learning
- URL: http://arxiv.org/abs/2511.13502v1
- Date: Mon, 17 Nov 2025 15:39:54 GMT
- Title: Tight and Practical Privacy Auditing for Differentially Private In-Context Learning
- Authors: Yuyang Xia, Ruixuan Liu, Li Xiong,
- Abstract summary: Large language models (LLMs) perform in-context learning (ICL) by adapting to tasks from prompt demonstrations, which in practice often contain private or proprietary data.<n>We present a tight and efficient privacy auditing framework for DP-ICL systems that runs membership inference attacks and translates their success rates into empirical privacy guarantees using Gaussian DP.<n> Experiments on standard text classification and generation benchmarks show that our empirical leakage estimates closely match theoretical DP budgets on classification tasks and are consistently lower on generation tasks due to conservative embedding-sensitivity bounds.
- Score: 11.394805414546903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) perform in-context learning (ICL) by adapting to tasks from prompt demonstrations, which in practice often contain private or proprietary data. Although differential privacy (DP) with private voting is a pragmatic mitigation, DP-ICL implementations are error-prone, and worst-case DP bounds may substantially overestimate actual leakage, calling for practical auditing tools. We present a tight and efficient privacy auditing framework for DP-ICL systems that runs membership inference attacks and translates their success rates into empirical privacy guarantees using Gaussian DP. Our analysis of the private voting mechanism identifies vote configurations that maximize the auditing signal, guiding the design of audit queries that reliably reveal whether a canary demonstration is present in the context. The framework supports both black-box (API-only) and white-box (internal vote) threat models, and unifies auditing for classification and generation by reducing both to a binary decision problem. Experiments on standard text classification and generation benchmarks show that our empirical leakage estimates closely match theoretical DP budgets on classification tasks and are consistently lower on generation tasks due to conservative embedding-sensitivity bounds, making our framework a practical privacy auditor and verifier for real-world DP-ICL deployments.
Related papers
- IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation [49.796717294455796]
We present IMMACULATE, a practical auditing framework that detects economically motivated deviations.<n>IMMACULATE selectively audits a small fraction of requests using verifiable computation, achieving strong detection guarantees while amortizing cryptographic overhead.
arXiv Detail & Related papers (2026-02-26T07:21:02Z) - Privacy in Theory, Bugs in Practice: Grey-Box Auditing of Differential Privacy Libraries [11.924357290256374]
We introduce Re:cord-play, a gray-box auditing paradigm that inspects the internal state of DP algorithms.<n>By running an instrumented algorithm on neighboring datasets with identical randomness, Re:cord-play directly checks for data-dependent control flow.<n>We show that our novel testing approach is both effective and necessary by auditing 12 open-source libraries.
arXiv Detail & Related papers (2026-02-19T15:18:00Z) - Sequential Auditing for f-Differential Privacy [5.7992233755396505]
We present new auditors to assess Differential Privacy (DP) of an algorithm based on output samples.<n>We shift the focus to the highly expressive privacy concept of $f$-DP, in which the entire privacy behavior is captured by a single tradeoff curve.
arXiv Detail & Related papers (2026-02-06T09:22:24Z) - NeuroFilter: Privacy Guardrails for Conversational LLM Agents [50.75206727081996]
This work addresses the computational challenge of enforcing privacy for agentic Large Language Models (LLMs)<n>NeuroFilter is a guardrail framework that operationalizes contextual integrity by mapping norm violations to simple directions in the model's activation space.<n>A comprehensive evaluation across over 150,000 interactions, covering models from 7B to 70B parameters, illustrates the strong performance of NeuroFilter.
arXiv Detail & Related papers (2026-01-21T05:16:50Z) - SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking [58.475471437150674]
We propose sequential watermarking for soft prompts (SWAP)<n>SWAP encodes watermarks through a specific order of defender-specified out-of-distribution classes.<n>Experiments on 11 datasets demonstrate SWAP's effectiveness, harmlessness, and robustness against potential adaptive attacks.
arXiv Detail & Related papers (2025-11-05T13:48:48Z) - VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference [3.8760740008451156]
We introduce VeriLLM, a publicly verifiable protocol for decentralized language models (LLMs) inference.<n>VeriLLM combines lightweight empirical rerunning with cryptographic commitments, allowing verifiers to validate results at approximately 1% of the underlying inference cost.<n>We show that VeriLLM achieves reliable public verifiability with minimal overhead.
arXiv Detail & Related papers (2025-09-29T04:07:32Z) - AVEC: Bootstrapping Privacy for Local LLMs [0.0]
AVEC is a framework for bootstrapping privacy for local language models.<n>It enforces privacy at the edge with explicit verifiability for delegated queries.
arXiv Detail & Related papers (2025-09-10T07:59:01Z) - KV-Auditor: Auditing Local Differential Privacy for Correlated Key-Value Estimation [3.1960143210470973]
We propose KV-Auditor, a framework for auditing LDP-based key-value estimation mechanisms.<n>We classify state-of-the-art LDP key-value mechanisms into interactive and non-interactive types.<n>For interactive mechanisms, we design a segmentation strategy to capture incremental privacy leakage across iterations.
arXiv Detail & Related papers (2025-08-15T14:17:24Z) - Urania: Differentially Private Insights into AI Use [102.27238986985698]
$Urania$ provides end-to-end privacy protection by leveraging DP tools such as clustering, partition selection, and histogram-based summarization.<n>Results show the framework's ability to extract meaningful conversational insights while maintaining stringent user privacy.
arXiv Detail & Related papers (2025-06-05T07:00:31Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - Antipodes of Label Differential Privacy: PATE and ALIBI [2.2761657094500682]
We consider the privacy-preserving machine learning (ML) setting where the trained model must satisfy differential privacy (DP)
We propose two novel approaches based on, respectively, the Laplace mechanism and the PATE framework.
We show how to achieve very strong privacy levels in some regimes, with our adaptation of the PATE framework.
arXiv Detail & Related papers (2021-06-07T08:14:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.