ContextLeak: Auditing Leakage in Private In-Context Learning Methods
- URL: http://arxiv.org/abs/2512.16059v1
- Date: Thu, 18 Dec 2025 00:53:19 GMT
- Title: ContextLeak: Auditing Leakage in Private In-Context Learning Methods
- Authors: Jacob Choi, Shuying Cao, Xingjian Dong, Wang Bill Zhu, Robin Jia, Sai Praneeth Karimireddy,
- Abstract summary: We introduce ContextLeak, the first framework to empirically measure the worst-case information leakage in ICL.<n>We find that ContextLeak tightly correlates with the theoretical privacy budget and reliably detects leakage.<n>Our results further reveal that existing methods often strike poor privacy-utility trade-offs, either leaking sensitive information or severely degrading performance.
- Score: 24.89856411893133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-Context Learning (ICL) has become a standard technique for adapting Large Language Models (LLMs) to specialized tasks by supplying task-specific exemplars within the prompt. However, when these exemplars contain sensitive information, reliable privacy-preserving mechanisms are essential to prevent unintended leakage through model outputs. Many privacy-preserving methods are proposed to protect the information leakage in the context, but there are less efforts on how to audit those methods. We introduce ContextLeak, the first framework to empirically measure the worst-case information leakage in ICL. ContextLeak uses canary insertion, embedding uniquely identifiable tokens in exemplars and crafting targeted queries to detect their presence. We apply ContextLeak across a range of private ICL techniques, both heuristic such as prompt-based defenses and those with theoretical guarantees such as Embedding Space Aggregation and Report Noisy Max. We find that ContextLeak tightly correlates with the theoretical privacy budget ($ε$) and reliably detects leakage. Our results further reveal that existing methods often strike poor privacy-utility trade-offs, either leaking sensitive information or severely degrading performance.
Related papers
- Private PoEtry: Private In-Context Learning via Product of Experts [58.496468062236225]
In-context learning (ICL) enables Large Language Models to adapt to new tasks with only a small set of examples at inference time.<n>Existing differential privacy approaches to ICL are either computationally expensive or rely on oversampling, synthetic data generation, or unnecessary thresholding.<n>We reformulate private ICL through the lens of a Product-of-Experts model. This gives a theoretically grounded framework, and the algorithm can be trivially parallelized.<n>We find that our method improves accuracy by more than 30 percentage points on average compared to prior DP-ICL methods, while maintaining strong privacy guarantees.
arXiv Detail & Related papers (2026-02-04T19:56:24Z) - Privacy-Aware In-Context Learning for Large Language Models [12.605629953620495]
Large language models (LLMs) raise privacy concerns due to potential exposure of sensitive information.<n>We introduce a novel private prediction framework for generating high-quality synthetic text with strong privacy guarantees.
arXiv Detail & Related papers (2025-09-17T01:50:32Z) - SoK: Semantic Privacy in Large Language Models [24.99241770349404]
This paper introduces a lifecycle-centric framework to analyze semantic privacy risks across input processing, pretraining, fine-tuning, and alignment stages of Large Language Models (LLMs)<n>We categorize key attack vectors and assess how current defenses, such as differential privacy, embedding encryption, edge computing, and unlearning, address these threats.<n>We conclude by outlining open challenges, including quantifying semantic leakage, protecting multimodal inputs, balancing de-identification with generation quality, and ensuring transparency in privacy enforcement.
arXiv Detail & Related papers (2025-06-30T08:08:15Z) - Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z) - On the Loss of Context-awareness in General Instruction Fine-tuning [101.03941308894191]
We investigate the loss of context awareness after supervised fine-tuning.<n>We find that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning.<n>We propose a metric to identify context-dependent examples from general instruction fine-tuning datasets.
arXiv Detail & Related papers (2024-11-05T00:16:01Z) - Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.<n>We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Knowledge Sanitization of Large Language Models [4.722882736419499]
Large language models (LLMs) trained on a large corpus of Web data can potentially reveal sensitive or confidential information.
Our technique efficiently fine-tunes these models using the Low-Rank Adaptation (LoRA) method.
Experimental results in a closed-book question-answering task show that our straightforward method not only minimizes particular knowledge leakage but also preserves the overall performance of LLMs.
arXiv Detail & Related papers (2023-09-21T07:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.