Related papers: How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing

How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing

URL: http://arxiv.org/abs/2205.10095v1
Date: Fri, 20 May 2022 11:29:44 GMT
Title: How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing
Authors: Samuel Sousa and Roman Kern
Abstract summary: Article systematically reviews over sixty methods for privacy-preserving NLP published between 2016 and 2020. We introduce a novel taxonomy for classifying the existing methods into three categories: methods trusted methods verification methods. We discuss open challenges in privacy-preserving NLP regarding data traceability, overhead dataset size and the prevalence of human biases in embeddings.
Score: 0.38073142980732994
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning (DL) models for natural language processing (NLP) tasks often handle private data, demanding protection against breaches and disclosures. Data protection laws, such as the European Union's General Data Protection Regulation (GDPR), thereby enforce the need for privacy. Although many privacy-preserving NLP methods have been proposed in recent years, no categories to organize them have been introduced yet, making it hard to follow the progress of the literature. To close this gap, this article systematically reviews over sixty DL methods for privacy-preserving NLP published between 2016 and 2020, covering theoretical foundations, privacy-enhancing technologies, and analysis of their suitability for real-world scenarios. First, we introduce a novel taxonomy for classifying the existing methods into three categories: data safeguarding methods, trusted methods, and verification methods. Second, we present an extensive summary of privacy threats, datasets for applications, and metrics for privacy evaluation. Third, throughout the review, we describe privacy issues in the NLP pipeline in a holistic view. Further, we discuss open challenges in privacy-preserving NLP regarding data traceability, computation overhead, dataset size, the prevalence of human biases in embeddings, and the privacy-utility tradeoff. Finally, this review presents future research directions to guide successive research and development of privacy-preserving NLP models.

Related papers

Natural Language Processing of Privacy Policies: A Survey [2.4058538793689497]
We conduct a literature review by analyzing 109 papers at the intersection of NLP and privacy policies. We provide a brief introduction to privacy policies and discuss various facets of associated problems. We identify the methodologies that can be further enhanced to provide robust privacy policies.
arXiv Detail & Related papers (2025-01-17T17:47:15Z)
Privacy-Preserving Large Language Models: Mechanisms, Applications, and Future Directions [0.0]
This survey explores the landscape of privacy-preserving mechanisms tailored for large language models. We examine their efficacy in addressing key privacy challenges, such as membership inference and model inversion attacks. By synthesizing state-of-the-art approaches and future trends, this paper provides a foundation for developing robust, privacy-preserving large language models.
arXiv Detail & Related papers (2024-12-09T00:24:09Z)
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z)
Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy. Within our study, we conducted expert interviews to gain insights into practices in the field. We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z)
Privacy-Preserving Language Model Inference with Instance Obfuscation [33.86459812694288]
Language Models as a Service (LM) offers convenient access for developers and researchers to perform inference using pre-trained language models. The input data and the inference results containing private information are exposed as plaintext during the service call, leading to privacy issues. We propose Instance-Obfuscated Inference (IOI) method, which focuses on addressing the decision privacy issue of natural language understanding tasks.
arXiv Detail & Related papers (2024-02-13T05:36:54Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
A Human-in-the-Loop Approach for Information Extraction from Privacy Policies under Data Scarcity [0.0]
We present a prototype system for a Human-in-the-Loop' approach to privacy policy annotation. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation.
arXiv Detail & Related papers (2023-05-24T10:45:26Z)
PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding. We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training. We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z)
No Free Lunch in "Privacy for Free: How does Dataset Condensation Help Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny. Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z)
Differential Privacy in Natural Language Processing: The Story So Far [21.844047604993687]
This paper aims to summarize the vulnerabilities addressed by Differential Privacy. This topic has sparked novel research, which is unified in one basic goal: how can one adapt Differential Privacy to NLP methods?
arXiv Detail & Related papers (2022-08-17T08:15:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.