Related papers: DP-Rewrite: Towards Reproducibility and Transparency in Differentially Private Text Rewriting

DP-Rewrite: Towards Reproducibility and Transparency in Differentially Private Text Rewriting

URL: http://arxiv.org/abs/2208.10400v1
Date: Mon, 22 Aug 2022 15:38:16 GMT
Title: DP-Rewrite: Towards Reproducibility and Transparency in Differentially Private Text Rewriting
Authors: Timour Igamberdiev, Thomas Arnold, Ivan Habernal
Abstract summary: We introduce DP-Rewrite, an open-source framework for differentially private text rewriting. Our system incorporates a variety of downstream datasets, models, pre-training procedures, and evaluation metrics. We provide a set of experiments as a case study on the ADePT DP text rewriting system, detecting a privacy leak in its pre-training approach.
Score: 2.465904360857451
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Text rewriting with differential privacy (DP) provides concrete theoretical guarantees for protecting the privacy of individuals in textual documents. In practice, existing systems may lack the means to validate their privacy-preserving claims, leading to problems of transparency and reproducibility. We introduce DP-Rewrite, an open-source framework for differentially private text rewriting which aims to solve these problems by being modular, extensible, and highly customizable. Our system incorporates a variety of downstream datasets, models, pre-training procedures, and evaluation metrics to provide a flexible way to lead and validate private text rewriting research. To demonstrate our software in practice, we provide a set of experiments as a case study on the ADePT DP text rewriting system, detecting a privacy leak in its pre-training approach. Our system is publicly available, and we hope that it will help the community to make DP text rewriting research more accessible and transparent.

Related papers

Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models [62.979954692036685]
We introduce PRSS, which refines the classifier-free guidance approach in diffusion models by integrating prompt re-anchoring and semantic prompt search. Our approach consistently improves the privacy-utility trade-off, establishing a new state-of-the-art.
arXiv Detail & Related papers (2025-04-25T02:51:23Z)
Privacy-Preserving Biometric Verification with Handwritten Random Digit String [49.77172854374479]
Handwriting verification has stood as a steadfast identity authentication method for decades. However, this technique risks potential privacy breaches due to the inclusion of personal information in handwritten biometrics such as signatures. We propose using the Random Digit String (RDS) for privacy-preserving handwriting verification.
arXiv Detail & Related papers (2025-03-17T03:47:25Z)
Investigating User Perspectives on Differentially Private Text Privatization [81.59631769859004]
This work investigates how factors of $textitscenario$, $textitdata sensitivity$, $textitmechanism type$, and $textitreason for data collection$ impact user preferences for text privatization. We learn that while all these factors play a role in influencing privacy decisions, users are highly sensitive to the utility and coherence of the private output texts.
arXiv Detail & Related papers (2025-03-12T12:33:20Z)
DP-GTR: Differentially Private Prompt Protection via Group Text Rewriting [16.861151219321737]
We introduce DP-GTR, a novel three-stage framework that leverages local differential privacy (DP) and the composition theorem via group text rewriting. Experiments on CommonSense QA and DocVQA demonstrate that DP-GTR outperforms existing approaches. Our framework is compatible with existing rewriting techniques, serving as a plug-in to enhance privacy protection.
arXiv Detail & Related papers (2025-03-06T21:39:42Z)
NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA [49.74911193222192]
The competition introduced a dataset of real invoice documents, along with associated questions and answers. The base model is a multi-modal generative language model, and sensitive information could be exposed through either the visual or textual input modality. Participants proposed elegant solutions to reduce communication costs while maintaining a minimum utility threshold.
arXiv Detail & Related papers (2024-11-06T07:51:19Z)
Thinking Outside of the Differential Privacy Box: A Case Study in Text Privatization with Language Model Prompting [3.3916160303055567]
We discuss the restrictions that Differential Privacy (DP) integration imposes, as well as bring to light the challenges that such restrictions entail. Our results demonstrate the need for more discussion on the usability of DP in NLP and its benefits over non-DP approaches.
arXiv Detail & Related papers (2024-10-01T14:46:15Z)
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit. We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z)
Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text [3.3916160303055567]
We propose a simple post-processing method based on the goal of aligning rewritten texts with their original counterparts. Our results show that such an approach not only produces outputs that are more semantically reminiscent of the original inputs, but also texts which score on average better in empirical privacy evaluations.
arXiv Detail & Related papers (2024-05-30T08:41:33Z)
RELIC: Investigating Large Language Model Responses using Self-Consistency [58.63436505595177]
Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. We propose an interactive system that helps users gain insight into the reliability of the generated text.
arXiv Detail & Related papers (2023-11-28T14:55:52Z)
InferDPT: Privacy-Preserving Inference for Black-box Large Language Model [66.07752875835506]
InferDPT is the first practical framework for the privacy-preserving Inference of black-box LLMs. RANTEXT is a novel differential privacy mechanism integrated into the perturbation module of InferDPT.
arXiv Detail & Related papers (2023-10-18T18:00:11Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
DP-BART for Privatized Text Rewriting under Local Differential Privacy [2.45626162429986]
We propose a new system 'DP-BART' that largely outperforms existing LDP systems. Our approach uses a novel clipping method, iterative pruning, and further training of internal representations which drastically reduces the amount of noise required for DP guarantees.
arXiv Detail & Related papers (2023-02-15T13:07:34Z)
How reparametrization trick broke differentially-private text representation leaning [2.45626162429986]
differential privacy is one of the favorite approaches to privacy-preserving methods in NLP. Despite its simplicity, it seems non-trivial to get it right when applying it to NLP. Our main goal is to raise awareness and help the community understand potential pitfalls of applying differential privacy to text representation learning.
arXiv Detail & Related papers (2022-02-24T15:02:42Z)
Differentially Private Representation for NLP: Formal Guarantee and An Empirical Study on Privacy and Fairness [38.90014773292902]
It has been demonstrated that hidden representation learned by a deep model can encode private information of the input. We propose Differentially Private Neural Representation (DPNR) to preserve the privacy of the extracted representation from text.
arXiv Detail & Related papers (2020-10-03T05:58:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.