Related papers: Differentially Private In-Context Learning with Nearest Neighbor Search

Differentially Private In-Context Learning with Nearest Neighbor Search

URL: http://arxiv.org/abs/2511.04332v1
Date: Thu, 06 Nov 2025 13:06:37 GMT
Title: Differentially Private In-Context Learning with Nearest Neighbor Search
Authors: Antti Koskela, Tejas Kulkarni, Laith Zumot,
Abstract summary: We introduce a DP framework for in-context learning that integrates nearest neighbor search of relevant examples in a privacy-aware manner.<n>Our method outperforms existing baselines by a substantial margin across all evaluated benchmarks.
Score: 5.932575574212546
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Differentially private in-context learning (DP-ICL) has recently become an active research topic due to the inherent privacy risks of in-context learning. However, existing approaches overlook a critical component of modern large language model (LLM) pipelines: the similarity search used to retrieve relevant context data. In this work, we introduce a DP framework for in-context learning that integrates nearest neighbor search of relevant examples in a privacy-aware manner. Our method outperforms existing baselines by a substantial margin across all evaluated benchmarks, achieving more favorable privacy-utility trade-offs. To achieve this, we employ nearest neighbor retrieval from a database of context data, combined with a privacy filter that tracks the cumulative privacy cost of selected samples to ensure adherence to a central differential privacy budget. Experimental results on text classification and document question answering show a clear advantage of the proposed method over existing baselines.

Related papers

Private PoEtry: Private In-Context Learning via Product of Experts [58.496468062236225]
In-context learning (ICL) enables Large Language Models to adapt to new tasks with only a small set of examples at inference time.<n>Existing differential privacy approaches to ICL are either computationally expensive or rely on oversampling, synthetic data generation, or unnecessary thresholding.<n>We reformulate private ICL through the lens of a Product-of-Experts model. This gives a theoretically grounded framework, and the algorithm can be trivially parallelized.<n>We find that our method improves accuracy by more than 30 percentage points on average compared to prior DP-ICL methods, while maintaining strong privacy guarantees.
arXiv Detail & Related papers (2026-02-04T19:56:24Z)
RAC: Retrieval-Augmented Clarification for Faithful Conversational Search [7.0486278653981245]
We introduce RAC (Retrieval-Augmented Clarification), a framework for generating corpus-faithful clarification questions.<n>After comparing several indexing strategies for retrieval, we fine-tune a large language model to make optimal use of research context.<n>We then apply contrastive preference optimization to favor questions supported by retrieved passages over ungrounded alternatives.
arXiv Detail & Related papers (2026-01-16T19:16:38Z)
Towards Context-aware Reasoning-enhanced Generative Searching in E-commerce [61.03081096959132]
We propose a context-aware reasoning-enhanced generative search framework for better textbfunderstanding the complicated context.<n>Our approach achieves superior performance compared with strong baselines, validating its effectiveness for search-based recommendation.
arXiv Detail & Related papers (2025-10-19T16:46:11Z)
Urania: Differentially Private Insights into AI Use [102.27238986985698]
$Urania$ provides end-to-end privacy protection by leveraging DP tools such as clustering, partition selection, and histogram-based summarization.<n>Results show the framework's ability to extract meaningful conversational insights while maintaining stringent user privacy.
arXiv Detail & Related papers (2025-06-05T07:00:31Z)
Towards Split Learning-based Privacy-Preserving Record Linkage [49.1574468325115]
Split Learning has been introduced to facilitate applications where user data privacy is a requirement. In this paper, we investigate the potentials of Split Learning for Privacy-Preserving Record Matching.
arXiv Detail & Related papers (2024-09-02T09:17:05Z)
A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off [45.07650884598811]
We compare seven different algorithms for achieving word-level Differential Privacy. We provide an in-depth analysis of the results with a focus on the privacy-utility trade-off. We suggest concrete steps forward for the research field.
arXiv Detail & Related papers (2024-04-04T09:48:14Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
From Contextual Data to Newsvendor Decisions: On the Actual Performance of Data-Driven Algorithms [8.89658755359509]
We study how the relevance/quality and quantity of past data influence performance by analyzing a contextual Newsvendor problem.<n>We analyze the performance of data-driven algorithms through a notion of context-dependent worst-case expected regret.
arXiv Detail & Related papers (2023-02-16T17:03:39Z)
Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies [74.01792675564218]
We develop a data augmentation framework based on ensembling retriever models that captures relevant text segments from unlabeled policy documents. To improve the diversity and quality of the augmented data, we leverage multiple pre-trained language models (LMs) and cascade them with noise reduction filter models. Using our augmented data on the PrivacyQA benchmark, we elevate the existing baseline by a large margin (10% F1) and achieve a new state-of-the-art F1 score of 50%.
arXiv Detail & Related papers (2022-04-19T15:45:23Z)
SPEED: Secure, PrivatE, and Efficient Deep learning [2.283665431721732]
We introduce a deep learning framework able to deal with strong privacy constraints. Based on collaborative learning, differential privacy and homomorphic encryption, the proposed approach advances state-of-the-art.
arXiv Detail & Related papers (2020-06-16T19:31:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.