1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy
- URL: http://arxiv.org/abs/2405.01678v1
- Date: Thu, 2 May 2024 19:07:32 GMT
- Title: 1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy
- Authors: Stephen Meisenbacher, Maulik Chevli, Florian Matthes,
- Abstract summary: $texttt1-Diffractor$ is a new mechanism that boasts high speedups in comparison to previous mechanisms.
We evaluate $texttt1-Diffractor$ for utility on several NLP tasks, for theoretical and task-based privacy, and for efficiency in terms of speed and memory.
- Score: 3.0177210416625124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The study of privacy-preserving Natural Language Processing (NLP) has gained rising attention in recent years. One promising avenue studies the integration of Differential Privacy in NLP, which has brought about innovative methods in a variety of application settings. Of particular note are $\textit{word-level Metric Local Differential Privacy (MLDP)}$ mechanisms, which work to obfuscate potentially sensitive input text by performing word-by-word $\textit{perturbations}$. Although these methods have shown promising results in empirical tests, there are two major drawbacks: (1) the inevitable loss of utility due to addition of noise, and (2) the computational expensiveness of running these mechanisms on high-dimensional word embeddings. In this work, we aim to address these challenges by proposing $\texttt{1-Diffractor}$, a new mechanism that boasts high speedups in comparison to previous mechanisms, while still demonstrating strong utility- and privacy-preserving capabilities. We evaluate $\texttt{1-Diffractor}$ for utility on several NLP tasks, for theoretical and task-based privacy, and for efficiency in terms of speed and memory. $\texttt{1-Diffractor}$ shows significant improvements in efficiency, while still maintaining competitive utility and privacy scores across all conducted comparative tests against previous MLDP mechanisms. Our code is made available at: https://github.com/sjmeis/Diffractor.
Related papers
- RPWithPrior: Label Differential Privacy in Regression [0.0]
In this paper, we focus on regression tasks under $$-label differential privacy guarantees.<n>We model both original and randomized responses as continuous random variables, avoiding discretization entirely.<n>We prove that our algorithm, RPWithPrior, guarantees $$-label differential privacy.
arXiv Detail & Related papers (2026-01-30T06:27:13Z) - Spend Your Budget Wisely: Towards an Intelligent Distribution of the Privacy Budget in Differentially Private Text Rewriting [3.0177210416625124]
We construct and evaluate a toolkit of linguistics- and NLP-based methods used to allocate a privacy budget to constituent tokens in a text document.
Our work highlights the intricacies of text privatization with DP, and furthermore, it calls for further work on finding more efficient ways to maximize the privatization benefits offered by DP in text rewriting.
arXiv Detail & Related papers (2025-03-28T12:33:46Z) - Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications.
Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility.
We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z) - Optimized Tradeoffs for Private Prediction with Majority Ensembling [59.99331405291337]
We introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm.
DaRRM is parameterized by a data-dependent noise function $gamma$, and enables efficient utility optimization over the class of all private algorithms.
We show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility.
arXiv Detail & Related papers (2024-11-27T00:48:48Z) - A Collocation-based Method for Addressing Challenges in Word-level Metric Differential Privacy [3.0177210416625124]
Several word-level $textitMetric$ Differential Privacy approaches have been proposed.
We devise a method where composed privatized outputs have higher semantic coherence and variable length.
We evaluate our method in utility and privacy tests, which make a clear case for tokenization strategies beyond the word level.
arXiv Detail & Related papers (2024-06-30T09:37:34Z) - Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$.
We first design novel efficient algorithms to privately release pair-wise cosine similarities.
We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z) - A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off [45.07650884598811]
We compare seven different algorithms for achieving word-level Differential Privacy.
We provide an in-depth analysis of the results with a focus on the privacy-utility trade-off.
We suggest concrete steps forward for the research field.
arXiv Detail & Related papers (2024-04-04T09:48:14Z) - Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Faster Privacy Accounting via Evolving Discretization [54.32252900997422]
We introduce a new algorithm for numerical composition of privacy random variables.
Our algorithm achieves a running time and memory usage of $mathrmpolylog(k)$ for the task of self-composing a mechanism.
arXiv Detail & Related papers (2022-07-10T04:25:37Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - THE-X: Privacy-Preserving Transformer Inference with Homomorphic
Encryption [112.02441503951297]
Privacy-preserving inference of transformer models is on the demand of cloud service users.
We introduce $textitTHE-X$, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models.
arXiv Detail & Related papers (2022-06-01T03:49:18Z) - Differentially Private Temporal Difference Learning with Stochastic
Nonconvex-Strongly-Concave Optimization [17.361143427007224]
temporal difference (TD) learning is a widely used method to evaluate policies in reinforcement learning.
In this paper, we consider preserving privacy in TD learning with a nonlinear value function.
We show that DPTD could provide $epsilon,n-differential privacy (DP) guarantee for sensitive information encoded in transitions and retain the original power of TD learning.
arXiv Detail & Related papers (2022-01-25T16:48:29Z) - Certifiably Robust Interpretation via Renyi Differential Privacy [77.04377192920741]
We study the problem of interpretation robustness from a new perspective of Renyi differential privacy (RDP)
First, it can offer provable and certifiable top-$k$ robustness.
Second, our proposed method offers $sim10%$ better experimental robustness than existing approaches.
Third, our method can provide a smooth tradeoff between robustness and computational efficiency.
arXiv Detail & Related papers (2021-07-04T06:58:01Z) - Oneshot Differentially Private Top-k Selection [23.88111547236874]
We introduce a fast, low-distortion, and differentially private primitive for the top-$k$ problem.
Compared with existing approaches in the literature, our algorithm adds Laplace noise to the counts and releases the top-$k$ noisy counts and their estimates in a oneshot fashion.
arXiv Detail & Related papers (2021-05-18T02:18:01Z) - On a Utilitarian Approach to Privacy Preserving Text Generation [5.123298347655088]
We propose a class of differentially private mechanisms that parameterizes the nearest neighbor selection criterion in traditional mechanisms.
Motivated by Vickrey auction, where only the second highest price is revealed and the highest price is kept private, we balance the choice between the first and the second nearest neighbors.
Experiments on real text classification datasets show up to 50% improvement in utility compared to the existing state-of-the-art with the same empirical privacy guarantee.
arXiv Detail & Related papers (2021-04-23T23:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.