Related papers: A Differentially Private Text Perturbation Method Using a Regularized Mahalanobis Metric

A Differentially Private Text Perturbation Method Using a Regularized Mahalanobis Metric

URL: http://arxiv.org/abs/2010.11947v1
Date: Thu, 22 Oct 2020 23:06:44 GMT
Title: A Differentially Private Text Perturbation Method Using a Regularized Mahalanobis Metric
Authors: Zekun Xu, Abhinav Aggarwal, Oluwaseyi Feyisetan, Nathanael Teissier
Abstract summary: A popular approach for privacy-preserving text analysis is noise injection, in which text data is first mapped into a continuous embedding space. We propose a text perturbation mechanism based on a carefully designed regularized variant of the Mahalanobis metric to overcome this problem. We provide a text-perturbation algorithm based on this metric and formally prove its privacy guarantees.
Score: 8.679020335206753
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Balancing the privacy-utility tradeoff is a crucial requirement of many practical machine learning systems that deal with sensitive customer data. A popular approach for privacy-preserving text analysis is noise injection, in which text data is first mapped into a continuous embedding space, perturbed by sampling a spherical noise from an appropriate distribution, and then projected back to the discrete vocabulary space. While this allows the perturbation to admit the required metric differential privacy, often the utility of downstream tasks modeled on this perturbed data is low because the spherical noise does not account for the variability in the density around different words in the embedding space. In particular, words in a sparse region are likely unchanged even when the noise scale is large. %Using the global sensitivity of the mechanism can potentially add too much noise to the words in the dense regions of the embedding space, causing a high utility loss, whereas using local sensitivity can leak information through the scale of the noise added. In this paper, we propose a text perturbation mechanism based on a carefully designed regularized variant of the Mahalanobis metric to overcome this problem. For any given noise scale, this metric adds an elliptical noise to account for the covariance structure in the embedding space. This heterogeneity in the noise scale along different directions helps ensure that the words in the sparse region have sufficient likelihood of replacement without sacrificing the overall utility. We provide a text-perturbation algorithm based on this metric and formally prove its privacy guarantees. Additionally, we empirically show that our mechanism improves the privacy statistics to achieve the same level of utility as compared to the state-of-the-art Laplace mechanism.

Related papers

Be Decisive: Noise-Induced Layouts for Multi-Subject Generation [56.80513553424086]
Complex prompts lead to subject leakage, causing inaccuracies in quantities, attributes, and visual features.<n>We introduce a new approach that predicts a spatial layout aligned with the prompt, derived from the initial noise, and refines it throughout the denoising process.<n>Our method employs a small neural network to predict and refine the evolving noise-induced layout at each denoising step.
arXiv Detail & Related papers (2025-05-27T17:54:24Z)
FreSca: Unveiling the Scaling Space in Diffusion Models [52.20473039489599]
Diffusion models offer impressive controllability for image tasks, primarily through noise predictions that encode task-specific information and guidance enabling adjustable scaling. We investigate this space, starting with inversion-based editing where the difference between conditional/unconditional noise predictions carries key semantic information. Our core contribution stems from a Fourier analysis of noise predictions, revealing that its low- and high-frequency components evolve differently throughout diffusion. Based on this insight, we introduce FreSca, a straightforward method that applies guidance scaling independently to different frequency bands in the Fourier domain.
arXiv Detail & Related papers (2025-04-02T22:03:11Z)
How to Learn in a Noisy World? Self-Correcting the Real-World Data Noise in Machine Translation [10.739338438716965]
In this paper, we introduce a process for simulating misalignment controlled by semantic similarity. We quantitatively analyze its impact on machine translation and demonstrate the limited effectiveness of widely used pre-filters for noise detection. We propose self-correction, an approach that gradually increases trust in the model's self-knowledge to correct the training supervision.
arXiv Detail & Related papers (2024-07-02T12:15:15Z)
Breaking the Communication-Privacy-Accuracy Tradeoff with $f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability. We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP) More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z)
Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs) We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data. We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z)
Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling [8.271859911016719]
We develop tools for robust inference under high-dimensional noise. We show that our approach is robust to variability in technical noise levels across cell types.
arXiv Detail & Related papers (2022-09-16T15:39:11Z)
The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators. In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z)
Differential privacy for symmetric log-concave mechanisms [0.0]
Adding random noise to database query results is an important tool for achieving privacy. We provide a sufficient and necessary condition for $(epsilon, delta)$-differential privacy for all symmetric and log-concave noise densities.
arXiv Detail & Related papers (2022-02-23T10:20:29Z)
Learning Numeric Optimal Differentially Private Truncated Additive Mechanisms [5.079561894598125]
We introduce a tool to learn truncated noise for additive mechanisms with strong utility bounds. We show that it is sufficient to consider symmetric and thatnew, for from the mean monotonically falling noise. For sensitivity bounded mechanisms, we show that it is sufficient to consider symmetric and thatnew, for from the mean monotonically falling noise.
arXiv Detail & Related papers (2021-07-27T17:22:57Z)
Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data. perturbations chosen independently at every agent, resulting in a significant performance loss. We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z)
Deconvoluting Kernel Density Estimation and Regression for Locally Differentially Private Data [14.095523601311374]
Local differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. We develop density estimation methods using smoothing kernels to remove the effect of privacy-preserving noise.
arXiv Detail & Related papers (2020-08-28T03:39:17Z)
Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models. We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise. Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z)
The Discrete Gaussian for Differential Privacy [26.179150185540514]
A key tool for building differentially private systems is adding Gaussian noise to the output of a function evaluated on a sensitive dataset. Previous work has demonstrated that seemingly innocuous numerical errors can entirely destroy privacy. We introduce and analyze the discrete Gaussian in the context of differential privacy.
arXiv Detail & Related papers (2020-03-31T18:00:00Z)
Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles [65.9694455739978]
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries. Our analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics. This implies that classical approaches cannot guarantee a non-trivial regret bound.
arXiv Detail & Related papers (2017-03-03T21:39:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.