DP-Rewrite: Towards Reproducibility and Transparency in Differentially
Private Text Rewriting
- URL: http://arxiv.org/abs/2208.10400v1
- Date: Mon, 22 Aug 2022 15:38:16 GMT
- Title: DP-Rewrite: Towards Reproducibility and Transparency in Differentially
Private Text Rewriting
- Authors: Timour Igamberdiev, Thomas Arnold, Ivan Habernal
- Abstract summary: We introduce DP-Rewrite, an open-source framework for differentially private text rewriting.
Our system incorporates a variety of downstream datasets, models, pre-training procedures, and evaluation metrics.
We provide a set of experiments as a case study on the ADePT DP text rewriting system, detecting a privacy leak in its pre-training approach.
- Score: 2.465904360857451
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Text rewriting with differential privacy (DP) provides concrete theoretical
guarantees for protecting the privacy of individuals in textual documents. In
practice, existing systems may lack the means to validate their
privacy-preserving claims, leading to problems of transparency and
reproducibility. We introduce DP-Rewrite, an open-source framework for
differentially private text rewriting which aims to solve these problems by
being modular, extensible, and highly customizable. Our system incorporates a
variety of downstream datasets, models, pre-training procedures, and evaluation
metrics to provide a flexible way to lead and validate private text rewriting
research. To demonstrate our software in practice, we provide a set of
experiments as a case study on the ADePT DP text rewriting system, detecting a
privacy leak in its pre-training approach. Our system is publicly available,
and we hope that it will help the community to make DP text rewriting research
more accessible and transparent.
Related papers
- NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA [49.74911193222192]
The competition introduced a dataset of real invoice documents, along with associated questions and answers.
The base model is a multi-modal generative language model, and sensitive information could be exposed through either the visual or textual input modality.
Participants proposed elegant solutions to reduce communication costs while maintaining a minimum utility threshold.
arXiv Detail & Related papers (2024-11-06T07:51:19Z) - Thinking Outside of the Differential Privacy Box: A Case Study in Text Privatization with Language Model Prompting [3.3916160303055567]
We discuss the restrictions that Differential Privacy (DP) integration imposes, as well as bring to light the challenges that such restrictions entail.
Our results demonstrate the need for more discussion on the usability of DP in NLP and its benefits over non-DP approaches.
arXiv Detail & Related papers (2024-10-01T14:46:15Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text [3.3916160303055567]
We propose a simple post-processing method based on the goal of aligning rewritten texts with their original counterparts.
Our results show that such an approach not only produces outputs that are more semantically reminiscent of the original inputs, but also texts which score on average better in empirical privacy evaluations.
arXiv Detail & Related papers (2024-05-30T08:41:33Z) - RELIC: Investigating Large Language Model Responses using Self-Consistency [58.63436505595177]
Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations.
We propose an interactive system that helps users gain insight into the reliability of the generated text.
arXiv Detail & Related papers (2023-11-28T14:55:52Z) - InferDPT: Privacy-Preserving Inference for Black-box Large Language Model [66.07752875835506]
InferDPT is the first practical framework for the privacy-preserving Inference of black-box LLMs.
RANTEXT is a novel differential privacy mechanism integrated into the perturbation module of InferDPT.
arXiv Detail & Related papers (2023-10-18T18:00:11Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - DP-BART for Privatized Text Rewriting under Local Differential Privacy [2.45626162429986]
We propose a new system 'DP-BART' that largely outperforms existing LDP systems.
Our approach uses a novel clipping method, iterative pruning, and further training of internal representations which drastically reduces the amount of noise required for DP guarantees.
arXiv Detail & Related papers (2023-02-15T13:07:34Z) - How reparametrization trick broke differentially-private text
representation leaning [2.45626162429986]
differential privacy is one of the favorite approaches to privacy-preserving methods in NLP.
Despite its simplicity, it seems non-trivial to get it right when applying it to NLP.
Our main goal is to raise awareness and help the community understand potential pitfalls of applying differential privacy to text representation learning.
arXiv Detail & Related papers (2022-02-24T15:02:42Z) - Differentially Private Representation for NLP: Formal Guarantee and An
Empirical Study on Privacy and Fairness [38.90014773292902]
It has been demonstrated that hidden representation learned by a deep model can encode private information of the input.
We propose Differentially Private Neural Representation (DPNR) to preserve the privacy of the extracted representation from text.
arXiv Detail & Related papers (2020-10-03T05:58:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.