Differential Privacy in Natural Language Processing: The Story So Far
- URL: http://arxiv.org/abs/2208.08140v1
- Date: Wed, 17 Aug 2022 08:15:44 GMT
- Title: Differential Privacy in Natural Language Processing: The Story So Far
- Authors: Oleksandra Klymenko, Stephen Meisenbacher, Florian Matthes
- Abstract summary: This paper aims to summarize the vulnerabilities addressed by Differential Privacy.
This topic has sparked novel research, which is unified in one basic goal: how can one adapt Differential Privacy to NLP methods?
- Score: 21.844047604993687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the tide of Big Data continues to influence the landscape of Natural
Language Processing (NLP), the utilization of modern NLP methods has grounded
itself in this data, in order to tackle a variety of text-based tasks. These
methods without a doubt can include private or otherwise personally
identifiable information. As such, the question of privacy in NLP has gained
fervor in recent years, coinciding with the development of new
Privacy-Enhancing Technologies (PETs). Among these PETs, Differential Privacy
boasts several desirable qualities in the conversation surrounding data
privacy. Naturally, the question becomes whether Differential Privacy is
applicable in the largely unstructured realm of NLP. This topic has sparked
novel research, which is unified in one basic goal: how can one adapt
Differential Privacy to NLP methods? This paper aims to summarize the
vulnerabilities addressed by Differential Privacy, the current thinking, and
above all, the crucial next steps that must be considered.
Related papers
- Differential Privacy Overview and Fundamental Techniques [63.0409690498569]
This chapter is meant to be part of the book "Differential Privacy in Artificial Intelligence: From Theory to Practice"
It starts by illustrating various attempts to protect data privacy, emphasizing where and why they failed.
It then defines the key actors, tasks, and scopes that make up the domain of privacy-preserving data analysis.
arXiv Detail & Related papers (2024-11-07T13:52:11Z) - Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied.
Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Differentially Private Natural Language Models: Recent Advances and
Future Directions [25.6006170131504]
Differential Privacy (DP) is becoming a de facto technique for private data analysis.
This paper provides the first systematic review of recent advances in DP deep learning models in NLP.
arXiv Detail & Related papers (2023-01-22T12:29:03Z) - PLUE: Language Understanding Evaluation Benchmark for Privacy Policies
in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding.
We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training.
We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z) - Algorithms with More Granular Differential Privacy Guarantees [65.3684804101664]
We consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis.
In this work, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person.
arXiv Detail & Related papers (2022-09-08T22:43:50Z) - How to keep text private? A systematic review of deep learning methods
for privacy-preserving natural language processing [0.38073142980732994]
Article systematically reviews over sixty methods for privacy-preserving NLP published between 2016 and 2020.
We introduce a novel taxonomy for classifying the existing methods into three categories: methods trusted methods verification methods.
We discuss open challenges in privacy-preserving NLP regarding data traceability, overhead dataset size and the prevalence of human biases in embeddings.
arXiv Detail & Related papers (2022-05-20T11:29:44Z) - Privacy-Adaptive BERT for Natural Language Understanding [20.821155542969947]
We study how to improve the effectiveness of NLU models under a Local Privacy setting using BERT.
We propose privacy-adaptive LM pretraining methods and demonstrate that they can significantly improve model performance on privatized text input.
arXiv Detail & Related papers (2021-04-15T15:01:28Z) - ADePT: Auto-encoder based Differentially Private Text Transformation [22.068984615657463]
We provide a utility-preserving differentially private text transformation algorithm using auto-encoders.
Our algorithm transforms text to offer robustness against attacks and produces transformations with high semantic quality.
Our results show that the proposed model performs better against MIA attacks while offering lower to no degradation in the utility of the underlying transformation process.
arXiv Detail & Related papers (2021-01-29T23:15:24Z) - Beyond The Text: Analysis of Privacy Statements through Syntactic and
Semantic Role Labeling [12.74252812104216]
This paper formulates a new task of extracting privacy parameters from a privacy policy, through the lens of Contextual Integrity.
We show that traditional NLP tasks, including the recently proposed Question-Answering based solutions, are insufficient to address the privacy parameter extraction problem.
arXiv Detail & Related papers (2020-10-01T20:48:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.