User-Entity Differential Privacy in Learning Natural Language Models
- URL: http://arxiv.org/abs/2211.01141v1
- Date: Tue, 1 Nov 2022 16:54:23 GMT
- Title: User-Entity Differential Privacy in Learning Natural Language Models
- Authors: Phung Lai, NhatHai Phan, Tong Sun, Rajiv Jain, Franck Dernoncourt,
Jiuxiang Gu, Nikolaos Barmpalios
- Abstract summary: We introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection simultaneously to both sensitive entities in textual data and data owners in learning natural language models (NLMs)
To preserve UeDP, we developed a novel algorithm, called UeDP-Alg, optimizing the trade-off between privacy loss and model utility with a tight bound sensitivity derived from seamlessly combining user and sensitive entity sampling processes.
An extensive theoretical analysis and evaluation show that our UeDP-Alg outperforms baseline approaches in model utility under the same privacy budget consumption on several NLM tasks, using benchmark datasets
- Score: 46.177052564590646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce a novel concept of user-entity differential
privacy (UeDP) to provide formal privacy protection simultaneously to both
sensitive entities in textual data and data owners in learning natural language
models (NLMs). To preserve UeDP, we developed a novel algorithm, called
UeDP-Alg, optimizing the trade-off between privacy loss and model utility with
a tight sensitivity bound derived from seamlessly combining user and sensitive
entity sampling processes. An extensive theoretical analysis and evaluation
show that our UeDP-Alg outperforms baseline approaches in model utility under
the same privacy budget consumption on several NLM tasks, using benchmark
datasets.
Related papers
- Advancing Personalized Federated Learning: Integrative Approaches with AI for Enhanced Privacy and Customization [0.0]
This paper proposes a novel approach that enhances PFL with cutting-edge AI techniques.
We present a model that boosts the performance of individual client models and ensures robust privacy-preserving mechanisms.
This work paves the way for a new era of truly personalized and privacy-conscious AI systems.
arXiv Detail & Related papers (2025-01-30T07:03:29Z) - SafeSynthDP: Leveraging Large Language Models for Privacy-Preserving Synthetic Data Generation Using Differential Privacy [0.0]
We investigate capability of Large Language Models (Ms) to generate synthetic datasets with Differential Privacy (DP) mechanisms.
Our approach incorporates DP-based noise injection methods, including Laplace and Gaussian distributions, into the data generation process.
We then evaluate the utility of these DP-enhanced synthetic datasets by comparing the performance of ML models trained on them against models trained on the original data.
arXiv Detail & Related papers (2024-12-30T01:10:10Z) - Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied.
Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - DPGOMI: Differentially Private Data Publishing with Gaussian Optimized
Model Inversion [8.204115285718437]
We propose Differentially Private Data Publishing with Gaussian Optimized Model Inversion (DPGOMI) to address this issue.
Our approach involves mapping private data to the latent space using a public generator, followed by a lower-dimensional DP-GAN with better convergence properties.
Our results show that DPGOMI outperforms the standard DP-GAN method in terms of Inception Score, Freche't Inception Distance, and classification performance.
arXiv Detail & Related papers (2023-10-06T18:46:22Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Differentially Private Language Models for Secure Data Sharing [19.918137395199224]
In this paper, we show how to train a generative language model in a differentially private manner and consequently sampling data from it.
Using natural language prompts and a new prompt-mismatch loss, we are able to create highly accurate and fluent textual datasets.
We perform thorough experiments indicating that our synthetic datasets do not leak information from our original data and are of high language quality.
arXiv Detail & Related papers (2022-10-25T11:12:56Z) - On the utility and protection of optimization with differential privacy
and classic regularization techniques [9.413131350284083]
We study the effectiveness of the differentially-private descent (DP-SGD) algorithm against standard optimization practices with regularization techniques.
We discuss differential privacy's flaws and limits and empirically demonstrate the often superior privacy-preserving properties of dropout and l2-regularization.
arXiv Detail & Related papers (2022-09-07T14:10:21Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Sensitivity analysis in differentially private machine learning using
hybrid automatic differentiation [54.88777449903538]
We introduce a novel textithybrid automatic differentiation (AD) system for sensitivity analysis.
This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data.
Our approach can enable the principled reasoning about privacy loss in the setting of data processing.
arXiv Detail & Related papers (2021-07-09T07:19:23Z) - PEARL: Data Synthesis via Private Embeddings and Adversarial
Reconstruction Learning [1.8692254863855962]
We propose a new framework of data using deep generative models in a differentially private manner.
Within our framework, sensitive data are sanitized with rigorous privacy guarantees in a one-shot fashion.
Our proposal has theoretical guarantees of performance, and empirical evaluations on multiple datasets show that our approach outperforms other methods at reasonable levels of privacy.
arXiv Detail & Related papers (2021-06-08T18:00:01Z) - User-Level Privacy-Preserving Federated Learning: Analysis and
Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models.
From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs.
We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.