Differential Privacy for Transformer Embeddings of Text with Nonparametric Variational Information Bottleneck
- URL: http://arxiv.org/abs/2601.02307v2
- Date: Wed, 14 Jan 2026 15:01:54 GMT
- Title: Differential Privacy for Transformer Embeddings of Text with Nonparametric Variational Information Bottleneck
- Authors: Dina El Zein, James Henderson,
- Abstract summary: We propose a privacy-preserving method for sharing text data by sharing noisy versions of their transformer embeddings.<n>Nonparametric Variational Differential Privacy (NVDP) ensures both useful data sharing and strong privacy protection.<n>We test NVDP on the General Language Understanding Evaluation (GLUE) benchmark and show that varying the noise level gives us a useful trade-off between privacy and accuracy.
- Score: 5.4924740564381125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a privacy-preserving method for sharing text data by sharing noisy versions of their transformer embeddings. It has been shown that hidden representations learned by deep models can encode sensitive information from the input, making it possible for adversaries to recover the input data with considerable accuracy. This problem is exacerbated in transformer embeddings because they consist of multiple vectors, one per token. To mitigate this risk, we propose Nonparametric Variational Differential Privacy (NVDP), which ensures both useful data sharing and strong privacy protection. We take a differential privacy (DP) approach, integrating a nonparametric variational information bottleneck (NVIB) layer into the transformer architecture to inject noise into its multivector embeddings and thereby hide information, and measuring privacy protection with Rényi Divergence (RD) and its corresponding Bayesian Differential Privacy (BDP) guarantee. Training the NVIB layer calibrates the noise level according to the utility of the downstream task. We test NVDP on the General Language Understanding Evaluation (GLUE) benchmark and show that varying the noise level gives us a useful trade-off between privacy and accuracy. With lower noise levels, our model maintains high accuracy while offering strong privacy guarantees, effectively balancing privacy and utility.
Related papers
- Synthetic Data Generation and Differential Privacy using Tensor Networks' Matrix Product States (MPS) [33.032422801043495]
We propose a method for generating privacy-preserving high-quality synthetic data using Matrix Product States (MPS)<n>We benchmark the MPS-based generative model against state-of-the-art models such as CTGAN, VAE, and PrivBayes.<n>Our results show that MPS outperforms classical models, particularly under strict privacy constraints.
arXiv Detail & Related papers (2025-08-08T12:14:57Z) - Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation [26.573578326262307]
Privacy-Aware Decoding (PAD) is a lightweight, inference-time defense that adaptively injects calibrated Gaussian noise into token logits during generation.<n>PAD integrates confidence-based screening to selectively protect high-risk tokens, efficient sensitivity estimation to minimize unnecessary noise, and context-aware noise calibration to balance privacy with generation quality.<n>Our work takes an important step toward mitigating privacy risks in RAG via decoding strategies, paving the way for universal and scalable privacy solutions in sensitive domains.
arXiv Detail & Related papers (2025-08-05T05:22:13Z) - Improving Noise Efficiency in Privacy-preserving Dataset Distillation [59.57846442477106]
We introduce a novel framework that decouples sampling from optimization for better convergence and improves signal quality.<n>On CIFAR-10, our method achieves a textbf10.0% improvement with 50 images per class and textbf8.3% increase with just textbfone-fifth the distilled set size of previous state-of-the-art methods.
arXiv Detail & Related papers (2025-08-03T13:15:52Z) - Federated Learning with Differential Privacy: An Utility-Enhanced Approach [12.614480013684759]
Federated learning has emerged as an attractive approach to protect data privacy by eliminating the need for sharing clients' data.<n>Recent studies have shown that federated learning alone does not guarantee privacy, as private data may still be inferred from the uploaded parameters to the central server.<n>We present a modification to these vanilla differentially private algorithms based on a Haar wavelet transformation step and a novel noise injection scheme that significantly lowers the bound of the noise variance.
arXiv Detail & Related papers (2025-03-27T04:48:29Z) - Activity Recognition on Avatar-Anonymized Datasets with Masked Differential Privacy [64.32494202656801]
Privacy-preserving computer vision is an important emerging problem in machine learning and artificial intelligence.<n>We present anonymization pipeline that replaces sensitive human subjects in video datasets with synthetic avatars within context.<n>We also proposeMaskDP to protect non-anonymized but privacy sensitive background information.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Explainable Differential Privacy-Hyperdimensional Computing for Balancing Privacy and Transparency in Additive Manufacturing Monitoring [5.282482641822561]
This study introduces the Differential Privacy-Hyperdimensional Computing (DP-HD) framework.<n>DP-HD enables precise tuning of DP noise levels, ensuring an optimal balance between privacy and performance.<n>It holds substantial promise for broader applications in privacy-sensitive domains such as healthcare, financial services, and government data management.
arXiv Detail & Related papers (2024-07-09T17:42:26Z) - A Randomized Approach for Tight Privacy Accounting [63.67296945525791]
We propose a new differential privacy paradigm called estimate-verify-release (EVR)
EVR paradigm first estimates the privacy parameter of a mechanism, then verifies whether it meets this guarantee, and finally releases the query output.
Our empirical evaluation shows the newly proposed EVR paradigm improves the utility-privacy tradeoff for privacy-preserving machine learning.
arXiv Detail & Related papers (2023-04-17T00:38:01Z) - Breaking the Communication-Privacy-Accuracy Tradeoff with
$f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability.
We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP)
More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Assessing Differentially Private Variational Autoencoders under
Membership Inference [26.480694390462617]
We quantify and compare the privacy-accuracy trade-off for differentially private Variational Autoencoders.
We do rarely observe favorable privacy-accuracy trade-off for Variational Autoencoders, and identify a case where LDP outperforms CDP.
arXiv Detail & Related papers (2022-04-16T21:53:09Z) - ADePT: Auto-encoder based Differentially Private Text Transformation [22.068984615657463]
We provide a utility-preserving differentially private text transformation algorithm using auto-encoders.
Our algorithm transforms text to offer robustness against attacks and produces transformations with high semantic quality.
Our results show that the proposed model performs better against MIA attacks while offering lower to no degradation in the utility of the underlying transformation process.
arXiv Detail & Related papers (2021-01-29T23:15:24Z) - Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions.
We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z) - RDP-GAN: A R\'enyi-Differential Privacy based Generative Adversarial
Network [75.81653258081435]
Generative adversarial network (GAN) has attracted increasing attention recently owing to its impressive ability to generate realistic samples with high privacy protection.
However, when GANs are applied on sensitive or private training examples, such as medical or financial records, it is still probable to divulge individuals' sensitive and private information.
We propose a R'enyi-differentially private-GAN (RDP-GAN), which achieves differential privacy (DP) in a GAN by carefully adding random noises on the value of the loss function during training.
arXiv Detail & Related papers (2020-07-04T09:51:02Z) - P3GM: Private High-Dimensional Data Release via Privacy Preserving
Phased Generative Model [23.91327154831855]
This paper proposes privacy-preserving phased generative model (P3GM) for releasing sensitive data.
P3GM employs the two-phase learning process to make it robust against the noise, and to increase learning efficiency.
Compared with the state-of-the-art methods, our generated samples look fewer noises and closer to the original data in terms of data diversity.
arXiv Detail & Related papers (2020-06-22T09:47:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.