Related papers: A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off

A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off

URL: http://arxiv.org/abs/2404.03324v1
Date: Thu, 4 Apr 2024 09:48:14 GMT
Title: A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off
Authors: Stephen Meisenbacher, Nihildev Nandakumar, Alexandra Klymenko, Florian Matthes,
Abstract summary: We compare seven different algorithms for achieving word-level Differential Privacy. We provide an in-depth analysis of the results with a focus on the privacy-utility trade-off. We suggest concrete steps forward for the research field.
Score: 45.07650884598811
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The application of Differential Privacy to Natural Language Processing techniques has emerged in relevance in recent years, with an increasing number of studies published in established NLP outlets. In particular, the adaptation of Differential Privacy for use in NLP tasks has first focused on the $\textit{word-level}$, where calibrated noise is added to word embedding vectors to achieve "noisy" representations. To this end, several implementations have appeared in the literature, each presenting an alternative method of achieving word-level Differential Privacy. Although each of these includes its own evaluation, no comparative analysis has been performed to investigate the performance of such methods relative to each other. In this work, we conduct such an analysis, comparing seven different algorithms on two NLP tasks with varying hyperparameters, including the $\textit{epsilon ($\varepsilon$)}$ parameter, or privacy budget. In addition, we provide an in-depth analysis of the results with a focus on the privacy-utility trade-off, as well as open-source our implementation code for further reproduction. As a result of our analysis, we give insight into the benefits and challenges of word-level Differential Privacy, and accordingly, we suggest concrete steps forward for the research field.

Related papers

Differentially Private In-Context Learning with Nearest Neighbor Search [5.932575574212546]
We introduce a DP framework for in-context learning that integrates nearest neighbor search of relevant examples in a privacy-aware manner.<n>Our method outperforms existing baselines by a substantial margin across all evaluated benchmarks.
arXiv Detail & Related papers (2025-11-06T13:06:37Z)
Urania: Differentially Private Insights into AI Use [104.7449031243196]
$Urania$ provides end-to-end privacy protection by leveraging DP tools such as clustering, partition selection, and histogram-based summarization.<n>Results show the framework's ability to extract meaningful conversational insights while maintaining stringent user privacy.
arXiv Detail & Related papers (2025-06-05T07:00:31Z)
Empirical Privacy Variance [32.41387301450962]
We show that models calibrated to the same $(varepsilon, delta)$-DP guarantee can exhibit significant variations in empirical privacy. We investigate the generality of this phenomenon across multiple dimensions and discuss why it is surprising and relevant. We propose two hypotheses, identify limitations in existing techniques like privacy auditing, and outline open questions for future research.
arXiv Detail & Related papers (2025-03-16T01:43:49Z)
Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications. Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility. We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z)
Differentially Private Policy Gradient [48.748194765816955]
We show that it is possible to find the right trade-off between privacy noise and trust-region size to obtain a performant differentially private policy gradient algorithm. Our results and the complexity of the tasks addressed represent a significant improvement over existing DP algorithms in online RL.
arXiv Detail & Related papers (2025-01-31T12:11:13Z)
Natural Language Processing of Privacy Policies: A Survey [2.4058538793689497]
We conduct a literature review by analyzing 109 papers at the intersection of NLP and privacy policies. We provide a brief introduction to privacy policies and discuss various facets of associated problems. We identify the methodologies that can be further enhanced to provide robust privacy policies.
arXiv Detail & Related papers (2025-01-17T17:47:15Z)
Optimized Tradeoffs for Private Prediction with Majority Ensembling [59.99331405291337]
We introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm. DaRRM is parameterized by a data-dependent noise function $gamma$, and enables efficient utility optimization over the class of all private algorithms. We show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility.
arXiv Detail & Related papers (2024-11-27T00:48:48Z)
Privacy-Preserving ECG Data Analysis with Differential Privacy: A Literature Review and A Case Study [1.1156009461711638]
We provide an overview of key concepts in differential privacy, followed by a literature review and discussion of its application to ECG analysis. In the second part of the paper, we explore how to implement differentially private query release on an arrhythmia database using a six-step process.
arXiv Detail & Related papers (2024-06-19T23:17:16Z)
1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy [3.0177210416625124]
$texttt1-Diffractor$ is a new mechanism that boasts high speedups in comparison to previous mechanisms. We evaluate $texttt1-Diffractor$ for utility on several NLP tasks, for theoretical and task-based privacy, and for efficiency in terms of speed and memory.
arXiv Detail & Related papers (2024-05-02T19:07:32Z)
Theoretically Principled Federated Learning for Balancing Privacy and Utility [61.03993520243198]
We propose a general learning framework for the protection mechanisms that protects privacy via distorting model parameters. It can achieve personalized utility-privacy trade-off for each model parameter, on each client, at each communication round in federated learning.
arXiv Detail & Related papers (2023-05-24T13:44:02Z)
A quantitative study of NLP approaches to question difficulty estimation [0.30458514384586394]
This work quantitatively analyzes several approaches proposed in previous research, and comparing their performance on datasets from different educational domains. We find that Transformer based models are the best performing across different educational domains, with DistilBERT performing almost as well as BERT. As for the other models, the hybrid ones often outperform the ones based on a single type of features, the ones based on linguistic features perform well on reading comprehension questions, while frequency based features (TF-IDF) and word embeddings (word2vec) perform better in domain knowledge assessment.
arXiv Detail & Related papers (2023-05-17T14:26:00Z)
On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis. We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy. The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z)
Algorithms with More Granular Differential Privacy Guarantees [65.3684804101664]
We consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis. In this work, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person.
arXiv Detail & Related papers (2022-09-08T22:43:50Z)
Private Domain Adaptation from a Public Source [48.83724068578305]
We design differentially private discrepancy-based algorithms for adaptation from a source domain with public labeled data to a target domain with unlabeled private data. Our solutions are based on private variants of Frank-Wolfe and Mirror-Descent algorithms.
arXiv Detail & Related papers (2022-08-12T06:52:55Z)
Semantics-Preserved Distortion for Personal Privacy Protection in Information Management [65.08939490413037]
This paper suggests a linguistically-grounded approach to distort texts while maintaining semantic integrity. We present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach. We also explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization.
arXiv Detail & Related papers (2022-01-04T04:01:05Z)
Differentially Private n-gram Extraction [19.401898070938593]
We revisit the problem of $n$-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many $n-grams as possible while preserving user level privacy. We develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art.
arXiv Detail & Related papers (2021-08-05T19:53:16Z)
ADePT: Auto-encoder based Differentially Private Text Transformation [22.068984615657463]
We provide a utility-preserving differentially private text transformation algorithm using auto-encoders. Our algorithm transforms text to offer robustness against attacks and produces transformations with high semantic quality. Our results show that the proposed model performs better against MIA attacks while offering lower to no degradation in the utility of the underlying transformation process.
arXiv Detail & Related papers (2021-01-29T23:15:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.