DP-BART for Privatized Text Rewriting under Local Differential Privacy
- URL: http://arxiv.org/abs/2302.07636v2
- Date: Tue, 6 Jun 2023 14:17:46 GMT
- Title: DP-BART for Privatized Text Rewriting under Local Differential Privacy
- Authors: Timour Igamberdiev and Ivan Habernal
- Abstract summary: We propose a new system 'DP-BART' that largely outperforms existing LDP systems.
Our approach uses a novel clipping method, iterative pruning, and further training of internal representations which drastically reduces the amount of noise required for DP guarantees.
- Score: 2.45626162429986
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Privatized text rewriting with local differential privacy (LDP) is a recent
approach that enables sharing of sensitive textual documents while formally
guaranteeing privacy protection to individuals. However, existing systems face
several issues, such as formal mathematical flaws, unrealistic privacy
guarantees, privatization of only individual words, as well as a lack of
transparency and reproducibility. In this paper, we propose a new system
'DP-BART' that largely outperforms existing LDP systems. Our approach uses a
novel clipping method, iterative pruning, and further training of internal
representations which drastically reduces the amount of noise required for DP
guarantees. We run experiments on five textual datasets of varying sizes,
rewriting them at different privacy guarantees and evaluating the rewritten
texts on downstream text classification tasks. Finally, we thoroughly discuss
the privatized text rewriting approach and its limitations, including the
problem of the strict text adjacency constraint in the LDP paradigm that leads
to the high noise requirement.
Related papers
- Enhancing Feature-Specific Data Protection via Bayesian Coordinate Differential Privacy [55.357715095623554]
Local Differential Privacy (LDP) offers strong privacy guarantees without requiring users to trust external parties.
We propose a Bayesian framework, Bayesian Coordinate Differential Privacy (BCDP), that enables feature-specific privacy quantification.
arXiv Detail & Related papers (2024-10-24T03:39:55Z) - Thinking Outside of the Differential Privacy Box: A Case Study in Text Privatization with Language Model Prompting [3.3916160303055567]
We discuss the restrictions that Differential Privacy (DP) integration imposes, as well as bring to light the challenges that such restrictions entail.
Our results demonstrate the need for more discussion on the usability of DP in NLP and its benefits over non-DP approaches.
arXiv Detail & Related papers (2024-10-01T14:46:15Z) - Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text [3.3916160303055567]
We propose a simple post-processing method based on the goal of aligning rewritten texts with their original counterparts.
Our results show that such an approach not only produces outputs that are more semantically reminiscent of the original inputs, but also texts which score on average better in empirical privacy evaluations.
arXiv Detail & Related papers (2024-05-30T08:41:33Z) - InferDPT: Privacy-Preserving Inference for Black-box Large Language Model [66.07752875835506]
InferDPT is the first practical framework for the privacy-preserving Inference of black-box LLMs.
RANTEXT is a novel differential privacy mechanism integrated into the perturbation module of InferDPT.
arXiv Detail & Related papers (2023-10-18T18:00:11Z) - Disentangling the Linguistic Competence of Privacy-Preserving BERT [0.0]
Differential Privacy (DP) has been tailored to address the unique challenges of text-to-text privatization.
We employ a series of interpretation techniques on the internal representations extracted from BERT trained on perturbed pre-text.
Using probing tasks to unpack this dissimilarity, we find evidence that text-to-text privatization affects the linguistic competence across several formalisms.
arXiv Detail & Related papers (2023-10-17T16:00:26Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - DP-Rewrite: Towards Reproducibility and Transparency in Differentially
Private Text Rewriting [2.465904360857451]
We introduce DP-Rewrite, an open-source framework for differentially private text rewriting.
Our system incorporates a variety of downstream datasets, models, pre-training procedures, and evaluation metrics.
We provide a set of experiments as a case study on the ADePT DP text rewriting system, detecting a privacy leak in its pre-training approach.
arXiv Detail & Related papers (2022-08-22T15:38:16Z) - Semantics-Preserved Distortion for Personal Privacy Protection in Information Management [65.08939490413037]
This paper suggests a linguistically-grounded approach to distort texts while maintaining semantic integrity.
We present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach.
We also explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization.
arXiv Detail & Related papers (2022-01-04T04:01:05Z) - Privacy Amplification via Shuffling for Linear Contextual Bandits [51.94904361874446]
We study the contextual linear bandit problem with differential privacy (DP)
We show that it is possible to achieve a privacy/utility trade-off between JDP and LDP by leveraging the shuffle model of privacy.
Our result shows that it is possible to obtain a tradeoff between JDP and LDP by leveraging the shuffle model while preserving local privacy.
arXiv Detail & Related papers (2021-12-11T15:23:28Z) - Beyond The Text: Analysis of Privacy Statements through Syntactic and
Semantic Role Labeling [12.74252812104216]
This paper formulates a new task of extracting privacy parameters from a privacy policy, through the lens of Contextual Integrity.
We show that traditional NLP tasks, including the recently proposed Question-Answering based solutions, are insufficient to address the privacy parameter extraction problem.
arXiv Detail & Related papers (2020-10-01T20:48:37Z) - Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL)
We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)
We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.