Related papers: Differential Privacy in Natural Language Processing: The Story So Far

Differential Privacy in Natural Language Processing: The Story So Far

URL: http://arxiv.org/abs/2208.08140v1
Date: Wed, 17 Aug 2022 08:15:44 GMT
Title: Differential Privacy in Natural Language Processing: The Story So Far
Authors: Oleksandra Klymenko, Stephen Meisenbacher, Florian Matthes
Abstract summary: This paper aims to summarize the vulnerabilities addressed by Differential Privacy. This topic has sparked novel research, which is unified in one basic goal: how can one adapt Differential Privacy to NLP methods?
Score: 21.844047604993687
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As the tide of Big Data continues to influence the landscape of Natural Language Processing (NLP), the utilization of modern NLP methods has grounded itself in this data, in order to tackle a variety of text-based tasks. These methods without a doubt can include private or otherwise personally identifiable information. As such, the question of privacy in NLP has gained fervor in recent years, coinciding with the development of new Privacy-Enhancing Technologies (PETs). Among these PETs, Differential Privacy boasts several desirable qualities in the conversation surrounding data privacy. Naturally, the question becomes whether Differential Privacy is applicable in the largely unstructured realm of NLP. This topic has sparked novel research, which is unified in one basic goal: how can one adapt Differential Privacy to NLP methods? This paper aims to summarize the vulnerabilities addressed by Differential Privacy, the current thinking, and above all, the crucial next steps that must be considered.

Related papers

Current State in Privacy-Preserving Text Preprocessing for Domain-Agnostic NLP [0.0]
Modern large language models require a huge amount of data to learn linguistic variations.<n>It is possible to extract private information from such language models.<n>This report focuses on a few approaches for domain-agnostic NLP tasks.
arXiv Detail & Related papers (2025-08-05T08:26:45Z)
Token-Level Privacy in Large Language Models [7.4143291213663955]
We introduce dchi-stencil, a novel token-level privacy-preserving mechanism that integrates contextual and semantic information. By incorporating both semantic and contextual nuances, dchi-stencil achieves a robust balance between privacy and utility. This work highlights the potential of dchi-stencil to set a new standard for privacy-preserving NLP in modern, high-risk applications.
arXiv Detail & Related papers (2025-03-05T16:27:25Z)
Natural Language Processing of Privacy Policies: A Survey [2.4058538793689497]
We conduct a literature review by analyzing 109 papers at the intersection of NLP and privacy policies. We provide a brief introduction to privacy policies and discuss various facets of associated problems. We identify the methodologies that can be further enhanced to provide robust privacy policies.
arXiv Detail & Related papers (2025-01-17T17:47:15Z)
Differential Privacy Overview and Fundamental Techniques [63.0409690498569]
This chapter is meant to be part of the book "Differential Privacy in Artificial Intelligence: From Theory to Practice" It starts by illustrating various attempts to protect data privacy, emphasizing where and why they failed. It then defines the key actors, tasks, and scopes that make up the domain of privacy-preserving data analysis.
arXiv Detail & Related papers (2024-11-07T13:52:11Z)
Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied. Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z)
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Differentially Private Natural Language Models: Recent Advances and Future Directions [25.6006170131504]
Differential Privacy (DP) is becoming a de facto technique for private data analysis. This paper provides the first systematic review of recent advances in DP deep learning models in NLP.
arXiv Detail & Related papers (2023-01-22T12:29:03Z)
PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding. We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training. We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z)
Algorithms with More Granular Differential Privacy Guarantees [65.3684804101664]
We consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis. In this work, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person.
arXiv Detail & Related papers (2022-09-08T22:43:50Z)
How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing [0.38073142980732994]
Article systematically reviews over sixty methods for privacy-preserving NLP published between 2016 and 2020. We introduce a novel taxonomy for classifying the existing methods into three categories: methods trusted methods verification methods. We discuss open challenges in privacy-preserving NLP regarding data traceability, overhead dataset size and the prevalence of human biases in embeddings.
arXiv Detail & Related papers (2022-05-20T11:29:44Z)
Privacy-Adaptive BERT for Natural Language Understanding [20.821155542969947]
We study how to improve the effectiveness of NLU models under a Local Privacy setting using BERT. We propose privacy-adaptive LM pretraining methods and demonstrate that they can significantly improve model performance on privatized text input.
arXiv Detail & Related papers (2021-04-15T15:01:28Z)
ADePT: Auto-encoder based Differentially Private Text Transformation [22.068984615657463]
We provide a utility-preserving differentially private text transformation algorithm using auto-encoders. Our algorithm transforms text to offer robustness against attacks and produces transformations with high semantic quality. Our results show that the proposed model performs better against MIA attacks while offering lower to no degradation in the utility of the underlying transformation process.
arXiv Detail & Related papers (2021-01-29T23:15:24Z)
Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling [12.74252812104216]
This paper formulates a new task of extracting privacy parameters from a privacy policy, through the lens of Contextual Integrity. We show that traditional NLP tasks, including the recently proposed Question-Answering based solutions, are insufficient to address the privacy parameter extraction problem.
arXiv Detail & Related papers (2020-10-01T20:48:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.