Synthetic Query Generation for Privacy-Preserving Deep Retrieval Systems using Differentially Private Language Models
- URL: http://arxiv.org/abs/2305.05973v3
- Date: Thu, 23 May 2024 05:53:39 GMT
- Title: Synthetic Query Generation for Privacy-Preserving Deep Retrieval Systems using Differentially Private Language Models
- Authors: Aldo Gael Carranza, Rezsa Farahani, Natalia Ponomareva, Alex Kurakin, Matthew Jagielski, Milad Nasr,
- Abstract summary: We propose an approach that prioritizes ensuring query privacy prior to training a deep retrieval system.
Our method employs DP language models (LMs) to generate private synthetic queries representative of the original data.
- Score: 21.66239227367523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the challenge of ensuring differential privacy (DP) guarantees in training deep retrieval systems. Training these systems often involves the use of contrastive-style losses, which are typically non-per-example decomposable, making them difficult to directly DP-train with since common techniques require per-example gradients. To address this issue, we propose an approach that prioritizes ensuring query privacy prior to training a deep retrieval system. Our method employs DP language models (LMs) to generate private synthetic queries representative of the original data. These synthetic queries can be used in downstream retrieval system training without compromising privacy. Our approach demonstrates a significant enhancement in retrieval quality compared to direct DP-training, all while maintaining query-level privacy guarantees. This work highlights the potential of harnessing LMs to overcome limitations in standard DP-training methods.
Related papers
- DP-DocLDM: Differentially Private Document Image Generation using Latent Diffusion Models [5.247930659596986]
We aim to address the challenges within the context of document image classification by substituting real private data with a synthetic counterpart.<n>In particular, we propose to use conditional latent diffusion models (LDMs) in combination with differential privacy (DP) to generate class-specific synthetic document images.<n>We show that our approach achieves substantial performance improvements in downstream evaluations on small-scale datasets.
arXiv Detail & Related papers (2025-08-06T08:43:08Z) - Differentially Private Relational Learning with Entity-level Privacy Guarantees [17.567309430451616]
This work presents a principled framework for relational learning with formal entity-level DP guarantees.<n>We introduce an adaptive gradient clipping scheme that modulates clipping thresholds based on entity occurrence frequency.<n>These contributions lead to a tailored DP-SGD variant for relational data with provable privacy guarantees.
arXiv Detail & Related papers (2025-06-10T02:03:43Z) - Privacy-preserving Prompt Personalization in Federated Learning for Multimodal Large Language Models [12.406403248205285]
Federated prompt personalization (FPP) is developed to address data heterogeneity and local overfitting.<n>We propose SecFPP, a secure FPP protocol harmonizing personalization, and privacy guarantees.<n>We show SecFPP significantly outperforms both non-private and privacy-preserving baselines.
arXiv Detail & Related papers (2025-05-28T15:09:56Z) - Forward Learning with Differential Privacy [27.164507868291913]
We propose a blueprivatized forward-learning algorithm, Differential Private Unified Likelihood Ratio (DP-ULR)
Our experiments indicate that DP-ULR achieves competitive performance compared to traditional differential privacy training algorithms based on backpropagation.
arXiv Detail & Related papers (2025-04-01T04:14:53Z) - Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied.
Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models [8.92538797216985]
We present our characterization of private RecSys training using DP-SGD, root-causing its several performance bottlenecks.
We propose LazyDP, an algorithm-software co-design that addresses the compute and memory challenges of training RecSys with DP-SGD.
Compared to a state-of-the-art DP-SGD training system, we demonstrate that LazyDP provides an average 119x training throughput improvement.
arXiv Detail & Related papers (2024-04-12T23:32:06Z) - Provable Privacy with Non-Private Pre-Processing [56.770023668379615]
We propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms.
Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions.
arXiv Detail & Related papers (2024-03-19T17:54:49Z) - LLM-based Privacy Data Augmentation Guided by Knowledge Distillation
with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost.
We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information.
For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees.
We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z) - Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control [3.8811062755861956]
$epsilon$-PrivateSMOTE is a technique for safeguarding against re-identification and linkage attacks.
Our proposal combines synthetic data generation via noise-induced adversarial with differential privacy principles to obfuscate high-risk cases.
arXiv Detail & Related papers (2022-12-01T13:20:37Z) - PEARL: Data Synthesis via Private Embeddings and Adversarial
Reconstruction Learning [1.8692254863855962]
We propose a new framework of data using deep generative models in a differentially private manner.
Within our framework, sensitive data are sanitized with rigorous privacy guarantees in a one-shot fashion.
Our proposal has theoretical guarantees of performance, and empirical evaluations on multiple datasets show that our approach outperforms other methods at reasonable levels of privacy.
arXiv Detail & Related papers (2021-06-08T18:00:01Z) - Federated Intrusion Detection for IoT with Heterogeneous Cohort Privacy [0.0]
Internet of Things (IoT) devices are becoming increasingly popular and are influencing many application domains such as healthcare and transportation.
In this work, we look at differentially private (DP) neural network (NN) based network intrusion detection systems (NIDS) to detect intrusion attacks on networks of such IoT devices.
Existing NN training solutions in this domain either ignore privacy considerations or assume that the privacy requirements are homogeneous across all users.
We show that the performance of existing differentially private methods degrade for clients with non-identical data distributions when clients' privacy requirements are heterogeneous.
arXiv Detail & Related papers (2021-01-25T03:33:27Z) - User-Level Privacy-Preserving Federated Learning: Analysis and
Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models.
From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs.
We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.