DP-Rewrite: Towards Reproducibility and Transparency in Differentially
Private Text Rewriting
- URL: http://arxiv.org/abs/2208.10400v1
- Date: Mon, 22 Aug 2022 15:38:16 GMT
- Title: DP-Rewrite: Towards Reproducibility and Transparency in Differentially
Private Text Rewriting
- Authors: Timour Igamberdiev, Thomas Arnold, Ivan Habernal
- Abstract summary: We introduce DP-Rewrite, an open-source framework for differentially private text rewriting.
Our system incorporates a variety of downstream datasets, models, pre-training procedures, and evaluation metrics.
We provide a set of experiments as a case study on the ADePT DP text rewriting system, detecting a privacy leak in its pre-training approach.
- Score: 2.465904360857451
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Text rewriting with differential privacy (DP) provides concrete theoretical
guarantees for protecting the privacy of individuals in textual documents. In
practice, existing systems may lack the means to validate their
privacy-preserving claims, leading to problems of transparency and
reproducibility. We introduce DP-Rewrite, an open-source framework for
differentially private text rewriting which aims to solve these problems by
being modular, extensible, and highly customizable. Our system incorporates a
variety of downstream datasets, models, pre-training procedures, and evaluation
metrics to provide a flexible way to lead and validate private text rewriting
research. To demonstrate our software in practice, we provide a set of
experiments as a case study on the ADePT DP text rewriting system, detecting a
privacy leak in its pre-training approach. Our system is publicly available,
and we hope that it will help the community to make DP text rewriting research
more accessible and transparent.
Related papers
- Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text [3.3916160303055567]
We propose a simple post-processing method based on the goal of aligning rewritten texts with their original counterparts.
Our results show that such an approach not only produces outputs that are more semantically reminiscent of the original inputs, but also texts which score on average better in empirical privacy evaluations.
arXiv Detail & Related papers (2024-05-30T08:41:33Z) - Private Online Community Detection for Censored Block Models [60.039026645807326]
We study the private online change detection problem for dynamic communities, using a censored block model (CBM)
We propose an algorithm capable of identifying changes in the community structure, while maintaining user privacy.
arXiv Detail & Related papers (2024-05-09T12:35:57Z) - RELIC: Investigating Large Language Model Responses using Self-Consistency [58.63436505595177]
Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations.
We propose an interactive system that helps users gain insight into the reliability of the generated text.
arXiv Detail & Related papers (2023-11-28T14:55:52Z) - InferDPT: Privacy-Preserving Inference for Black-box Large Language Model [66.07752875835506]
InferDPT is the first practical framework for the privacy-preserving Inference of black-box LLMs.
RANTEXT is a novel differential privacy mechanism integrated into the perturbation module of InferDPT.
arXiv Detail & Related papers (2023-10-18T18:00:11Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - DP-BART for Privatized Text Rewriting under Local Differential Privacy [2.45626162429986]
We propose a new system 'DP-BART' that largely outperforms existing LDP systems.
Our approach uses a novel clipping method, iterative pruning, and further training of internal representations which drastically reduces the amount of noise required for DP guarantees.
arXiv Detail & Related papers (2023-02-15T13:07:34Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - How reparametrization trick broke differentially-private text
representation leaning [2.45626162429986]
differential privacy is one of the favorite approaches to privacy-preserving methods in NLP.
Despite its simplicity, it seems non-trivial to get it right when applying it to NLP.
Our main goal is to raise awareness and help the community understand potential pitfalls of applying differential privacy to text representation learning.
arXiv Detail & Related papers (2022-02-24T15:02:42Z) - CAPE: Context-Aware Private Embeddings for Private Language Learning [0.5156484100374058]
Context-Aware Private Embeddings (CAPE) is a novel approach which preserves privacy during training of embeddings.
CAPE applies calibrated noise through differential privacy, preserving the encoded semantic links while obscuring sensitive information.
Experimental results demonstrate that the proposed approach reduces private information leakage better than either single intervention.
arXiv Detail & Related papers (2021-08-27T14:50:12Z) - Differentially Private Representation for NLP: Formal Guarantee and An
Empirical Study on Privacy and Fairness [38.90014773292902]
It has been demonstrated that hidden representation learned by a deep model can encode private information of the input.
We propose Differentially Private Neural Representation (DPNR) to preserve the privacy of the extracted representation from text.
arXiv Detail & Related papers (2020-10-03T05:58:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.