Granularity is crucial when applying differential privacy to text: An investigation for neural machine translation
- URL: http://arxiv.org/abs/2407.18789v2
- Date: Thu, 26 Sep 2024 14:48:42 GMT
- Title: Granularity is crucial when applying differential privacy to text: An investigation for neural machine translation
- Authors: Doan Nam Long Vu, Timour Igamberdiev, Ivan Habernal,
- Abstract summary: differential privacy (DP) is becoming increasingly popular in NLP.
The choice of granularity at which DP is applied is often neglected.
Our findings indicate that the document-level NMT system is more resistant to membership inference attacks.
- Score: 13.692397169805806
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Applying differential privacy (DP) by means of the DP-SGD algorithm to protect individual data points during training is becoming increasingly popular in NLP. However, the choice of granularity at which DP is applied is often neglected. For example, neural machine translation (NMT) typically operates on the sentence-level granularity. From the perspective of DP, this setup assumes that each sentence belongs to a single person and any two sentences in the training dataset are independent. This assumption is however violated in many real-world NMT datasets, e.g., those including dialogues. For proper application of DP we thus must shift from sentences to entire documents. In this paper, we investigate NMT at both the sentence and document levels, analyzing the privacy/utility trade-off for both scenarios, and evaluating the risks of not using the appropriate privacy granularity in terms of leaking personally identifiable information (PII). Our findings indicate that the document-level NMT system is more resistant to membership inference attacks, emphasizing the significance of using the appropriate granularity when working with DP.
Related papers
- DP-2Stage: Adapting Language Models as Differentially Private Tabular Data Generators [47.86275136491794]
We propose DP-2Stage, a two-stage fine-tuning framework for differentially private data generation.
Our empirical results show that this approach improves performance across various settings and metrics.
arXiv Detail & Related papers (2024-12-03T14:10:09Z) - Thinking Outside of the Differential Privacy Box: A Case Study in Text Privatization with Language Model Prompting [3.3916160303055567]
We discuss the restrictions that Differential Privacy (DP) integration imposes, as well as bring to light the challenges that such restrictions entail.
Our results demonstrate the need for more discussion on the usability of DP in NLP and its benefits over non-DP approaches.
arXiv Detail & Related papers (2024-10-01T14:46:15Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Metric Differential Privacy at the User-Level Via the Earth Mover's Distance [34.63551774740707]
Metric differential privacy (DP) provides heterogeneous privacy guarantees based on a distance between the pair of inputs.
In this paper, we initiate the study of one natural definition of metric DP at the user-level.
We design two novel mechanisms under $d_textsfEM$-DP to answer linear queries and item-wise queries.
arXiv Detail & Related papers (2024-05-04T13:29:11Z) - Differentially Private Reinforcement Learning with Self-Play [18.124829682487558]
We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints.
We first extend the definitions of Joint DP (JDP) and Local DP (LDP) to two-player zero-sum episodic Markov Games.
We design a provably efficient algorithm based on optimistic Nash value and privatization of Bernstein-type bonuses.
arXiv Detail & Related papers (2024-04-11T08:42:51Z) - How Private are DP-SGD Implementations? [61.19794019914523]
We show that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
arXiv Detail & Related papers (2024-03-26T13:02:43Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Subject Granular Differential Privacy in Federated Learning [2.9439848714137447]
We propose two new algorithms that enforce subject level DP at each federation user locally.
Our first algorithm, called LocalGroupDP, is a straightforward application of group differential privacy in the popular DP-SGD algorithm.
Our second algorithm is based on a novel idea of hierarchical gradient averaging (HiGradAvgDP) for subjects participating in a training mini-batch.
arXiv Detail & Related papers (2022-06-07T23:54:36Z) - A Privacy-Preserving Subgraph-Level Federated Graph Neural Network via
Differential Privacy [23.05377582226823]
We propose DP-FedRec, a DP-based federated GNN to solve the non independent and identically distributed (non-IID) data problem.
DP is applied not only on the weights but also on the edges of the intersection graph from PSI to fully protect the privacy of clients.
The evaluation demonstrates DP-FedRec achieves better performance with the graph extension and DP only introduces little computations overhead.
arXiv Detail & Related papers (2022-06-07T08:14:45Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Privacy Amplification via Shuffling for Linear Contextual Bandits [51.94904361874446]
We study the contextual linear bandit problem with differential privacy (DP)
We show that it is possible to achieve a privacy/utility trade-off between JDP and LDP by leveraging the shuffle model of privacy.
Our result shows that it is possible to obtain a tradeoff between JDP and LDP by leveraging the shuffle model while preserving local privacy.
arXiv Detail & Related papers (2021-12-11T15:23:28Z) - Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis.
In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis.
We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.