A Differentially Private Text Perturbation Method Using a Regularized
Mahalanobis Metric
- URL: http://arxiv.org/abs/2010.11947v1
- Date: Thu, 22 Oct 2020 23:06:44 GMT
- Title: A Differentially Private Text Perturbation Method Using a Regularized
Mahalanobis Metric
- Authors: Zekun Xu, Abhinav Aggarwal, Oluwaseyi Feyisetan, Nathanael Teissier
- Abstract summary: A popular approach for privacy-preserving text analysis is noise injection, in which text data is first mapped into a continuous embedding space.
We propose a text perturbation mechanism based on a carefully designed regularized variant of the Mahalanobis metric to overcome this problem.
We provide a text-perturbation algorithm based on this metric and formally prove its privacy guarantees.
- Score: 8.679020335206753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Balancing the privacy-utility tradeoff is a crucial requirement of many
practical machine learning systems that deal with sensitive customer data. A
popular approach for privacy-preserving text analysis is noise injection, in
which text data is first mapped into a continuous embedding space, perturbed by
sampling a spherical noise from an appropriate distribution, and then projected
back to the discrete vocabulary space. While this allows the perturbation to
admit the required metric differential privacy, often the utility of downstream
tasks modeled on this perturbed data is low because the spherical noise does
not account for the variability in the density around different words in the
embedding space. In particular, words in a sparse region are likely unchanged
even when the noise scale is large. %Using the global sensitivity of the
mechanism can potentially add too much noise to the words in the dense regions
of the embedding space, causing a high utility loss, whereas using local
sensitivity can leak information through the scale of the noise added.
In this paper, we propose a text perturbation mechanism based on a carefully
designed regularized variant of the Mahalanobis metric to overcome this
problem. For any given noise scale, this metric adds an elliptical noise to
account for the covariance structure in the embedding space. This heterogeneity
in the noise scale along different directions helps ensure that the words in
the sparse region have sufficient likelihood of replacement without sacrificing
the overall utility. We provide a text-perturbation algorithm based on this
metric and formally prove its privacy guarantees. Additionally, we empirically
show that our mechanism improves the privacy statistics to achieve the same
level of utility as compared to the state-of-the-art Laplace mechanism.
Related papers
- Breaking the Communication-Privacy-Accuracy Tradeoff with
$f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability.
We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP)
More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - Robust Inference of Manifold Density and Geometry by Doubly Stochastic
Scaling [8.271859911016719]
We develop tools for robust inference under high-dimensional noise.
We show that our approach is robust to variability in technical noise levels across cell types.
arXiv Detail & Related papers (2022-09-16T15:39:11Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Differential privacy for symmetric log-concave mechanisms [0.0]
Adding random noise to database query results is an important tool for achieving privacy.
We provide a sufficient and necessary condition for $(epsilon, delta)$-differential privacy for all symmetric and log-concave noise densities.
arXiv Detail & Related papers (2022-02-23T10:20:29Z) - Learning Numeric Optimal Differentially Private Truncated Additive
Mechanisms [5.079561894598125]
We introduce a tool to learn truncated noise for additive mechanisms with strong utility bounds.
We show that it is sufficient to consider symmetric and thatnew, for from the mean monotonically falling noise.
For sensitivity bounded mechanisms, we show that it is sufficient to consider symmetric and thatnew, for from the mean monotonically falling noise.
arXiv Detail & Related papers (2021-07-27T17:22:57Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z) - Deconvoluting Kernel Density Estimation and Regression for Locally
Differentially Private Data [14.095523601311374]
Local differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points.
However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy.
We develop density estimation methods using smoothing kernels to remove the effect of privacy-preserving noise.
arXiv Detail & Related papers (2020-08-28T03:39:17Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z) - The Discrete Gaussian for Differential Privacy [26.179150185540514]
A key tool for building differentially private systems is adding Gaussian noise to the output of a function evaluated on a sensitive dataset.
Previous work has demonstrated that seemingly innocuous numerical errors can entirely destroy privacy.
We introduce and analyze the discrete Gaussian in the context of differential privacy.
arXiv Detail & Related papers (2020-03-31T18:00:00Z) - Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles [65.9694455739978]
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries.
Our analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics.
This implies that classical approaches cannot guarantee a non-trivial regret bound.
arXiv Detail & Related papers (2017-03-03T21:39:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.