Private Language Models via Truncated Laplacian Mechanism
- URL: http://arxiv.org/abs/2410.08027v1
- Date: Thu, 10 Oct 2024 15:25:02 GMT
- Title: Private Language Models via Truncated Laplacian Mechanism
- Authors: Tianhao Huang, Tao Yang, Ivan Habernal, Lijie Hu, Di Wang,
- Abstract summary: We propose a novel private embedding method called the high dimensional truncated Laplacian mechanism.
We show that our method has a lower variance compared to the previous private word embedding methods.
Remarkably, even in the high privacy regime, our approach only incurs a slight decrease in utility compared to the non-private scenario.
- Score: 18.77713904999236
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models for NLP tasks are prone to variants of privacy attacks. To prevent privacy leakage, researchers have investigated word-level perturbations, relying on the formal guarantees of differential privacy (DP) in the embedding space. However, many existing approaches either achieve unsatisfactory performance in the high privacy regime when using the Laplacian or Gaussian mechanism, or resort to weaker relaxations of DP that are inferior to the canonical DP in terms of privacy strength. This raises the question of whether a new method for private word embedding can be designed to overcome these limitations. In this paper, we propose a novel private embedding method called the high dimensional truncated Laplacian mechanism. Specifically, we introduce a non-trivial extension of the truncated Laplacian mechanism, which was previously only investigated in one-dimensional space cases. Theoretically, we show that our method has a lower variance compared to the previous private word embedding methods. To further validate its effectiveness, we conduct comprehensive experiments on private embedding and downstream tasks using three datasets. Remarkably, even in the high privacy regime, our approach only incurs a slight decrease in utility compared to the non-private scenario.
Related papers
- Minimax Optimal Two-Sample Testing under Local Differential Privacy [3.3317825075368908]
We explore the trade-off between privacy and statistical utility in private two-sample testing under local differential privacy (LDP)
We introduce private permutation tests using practical privacy mechanisms such as Laplace, discrete Laplace, and Google's RAPPOR.
We study continuous data via binning and study its uniform separation rates under LDP over H"older and Besov smoothness classes.
arXiv Detail & Related papers (2024-11-13T22:44:25Z) - Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied.
Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Approximating Two-Layer ReLU Networks for Hidden State Analysis in Differential Privacy [3.8254443661593633]
We show that it is possible to privately train convex problems with privacy-utility trade-offs comparable to those of one hidden-layer ReLU networks trained with DP-SGD.
Our experiments on benchmark classification tasks show that NoisyCGD can achieve privacy-utility trade-offs comparable to DP-SGD applied to one-hidden-layer ReLU networks.
arXiv Detail & Related papers (2024-07-05T22:43:32Z) - Uncertainty quantification by block bootstrap for differentially private stochastic gradient descent [1.0742675209112622]
Gradient Descent (SGD) is a widely used tool in machine learning.
Uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors.
We propose a novel block bootstrap for SGD under local differential privacy.
arXiv Detail & Related papers (2024-05-21T07:47:21Z) - Bounded and Unbiased Composite Differential Privacy [25.427802467876248]
The objective of differential privacy (DP) is to protect privacy by producing an output distribution that is indistinguishable between two neighboring databases.
Existing solutions attempt to address this issue by employing post-processing or truncation techniques.
We propose a novel differentially private mechanism which uses a composite probability density function to generate bounded and unbiased outputs.
arXiv Detail & Related papers (2023-11-04T04:43:47Z) - Breaking the Communication-Privacy-Accuracy Tradeoff with
$f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability.
We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP)
More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z) - On the Statistical Complexity of Estimation and Testing under Privacy Constraints [17.04261371990489]
We show how to characterize the power of a statistical test under differential privacy in a plug-and-play fashion.
We show that maintaining privacy results in a noticeable reduction in performance only when the level of privacy protection is very high.
Finally, we demonstrate that the DP-SGLD algorithm, a private convex solver, can be employed for maximum likelihood estimation with a high degree of confidence.
arXiv Detail & Related papers (2022-10-05T12:55:53Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis.
In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis.
We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z) - A One-Pass Private Sketch for Most Machine Learning Tasks [48.17461258268463]
Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees.
We propose a private sketch that supports a multitude of machine learning tasks including regression, classification, density estimation, and more.
Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm.
arXiv Detail & Related papers (2020-06-16T17:47:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.