Differentially Private Natural Language Models: Recent Advances and
Future Directions
- URL: http://arxiv.org/abs/2301.09112v2
- Date: Mon, 23 Oct 2023 13:28:35 GMT
- Title: Differentially Private Natural Language Models: Recent Advances and
Future Directions
- Authors: Lijie Hu, Ivan Habernal, Lei Shen and Di Wang
- Abstract summary: Differential Privacy (DP) is becoming a de facto technique for private data analysis.
This paper provides the first systematic review of recent advances in DP deep learning models in NLP.
- Score: 25.6006170131504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent developments in deep learning have led to great success in various
natural language processing (NLP) tasks. However, these applications may
involve data that contain sensitive information. Therefore, how to achieve good
performance while also protecting the privacy of sensitive data is a crucial
challenge in NLP. To preserve privacy, Differential Privacy (DP), which can
prevent reconstruction attacks and protect against potential side knowledge, is
becoming a de facto technique for private data analysis. In recent years, NLP
in DP models (DP-NLP) has been studied from different perspectives, which
deserves a comprehensive review. In this paper, we provide the first systematic
review of recent advances in DP deep learning models in NLP. In particular, we
first discuss some differences and additional challenges of DP-NLP compared
with the standard DP deep learning. Then, we investigate some existing work on
DP-NLP and present its recent developments from three aspects: gradient
perturbation based methods, embedding vector perturbation based methods, and
ensemble model based methods. We also discuss some challenges and future
directions.
Related papers
- Thinking Outside of the Differential Privacy Box: A Case Study in Text Privatization with Language Model Prompting [3.3916160303055567]
We discuss the restrictions that Differential Privacy (DP) integration imposes, as well as bring to light the challenges that such restrictions entail.
Our results demonstrate the need for more discussion on the usability of DP in NLP and its benefits over non-DP approaches.
arXiv Detail & Related papers (2024-10-01T14:46:15Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - SoK: Privacy-Preserving Data Synthesis [72.92263073534899]
This paper focuses on privacy-preserving data synthesis (PPDS) by providing a comprehensive overview, analysis, and discussion of the field.
We put forth a master recipe that unifies two prominent strands of research in PPDS: statistical methods and deep learning (DL)-based methods.
arXiv Detail & Related papers (2023-07-05T08:29:31Z) - Uncertainty in Natural Language Processing: Sources, Quantification, and
Applications [56.130945359053776]
We provide a comprehensive review of uncertainty-relevant works in the NLP field.
We first categorize the sources of uncertainty in natural language into three types, including input, system, and output.
We discuss the challenges of uncertainty estimation in NLP and discuss potential future directions.
arXiv Detail & Related papers (2023-06-05T06:46:53Z) - On the Explainability of Natural Language Processing Deep Models [3.0052400859458586]
Methods have been developed to address the challenges and present satisfactory explanations on Natural Language Processing (NLP) models.
Motivated to democratize ExAI methods in the NLP field, we present in this work a survey that studies model-agnostic as well as model-specific explainability methods on NLP models.
arXiv Detail & Related papers (2022-10-13T11:59:39Z) - Differential Privacy in Natural Language Processing: The Story So Far [21.844047604993687]
This paper aims to summarize the vulnerabilities addressed by Differential Privacy.
This topic has sparked novel research, which is unified in one basic goal: how can one adapt Differential Privacy to NLP methods?
arXiv Detail & Related papers (2022-08-17T08:15:44Z) - How to keep text private? A systematic review of deep learning methods
for privacy-preserving natural language processing [0.38073142980732994]
Article systematically reviews over sixty methods for privacy-preserving NLP published between 2016 and 2020.
We introduce a novel taxonomy for classifying the existing methods into three categories: methods trusted methods verification methods.
We discuss open challenges in privacy-preserving NLP regarding data traceability, overhead dataset size and the prevalence of human biases in embeddings.
arXiv Detail & Related papers (2022-05-20T11:29:44Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - An Empirical Survey of Data Augmentation for Limited Data Learning in
NLP [88.65488361532158]
dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks.
Data augmentation methods have been explored as a means of improving data efficiency in NLP.
We provide an empirical survey of recent progress on data augmentation for NLP in the limited labeled data setting.
arXiv Detail & Related papers (2021-06-14T15:27:22Z) - Gradient Masking and the Underestimated Robustness Threats of
Differential Privacy in Deep Learning [0.0]
This paper experimentally evaluates the impact of training with Differential Privacy (DP) on model vulnerability against a broad range of adversarial attacks.
The results suggest that private models are less robust than their non-private counterparts, and that adversarial examples transfer better among DP models than between non-private and private ones.
arXiv Detail & Related papers (2021-05-17T16:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.