Related papers: An Efficient DP-SGD Mechanism for Large Scale NLP Models

An Efficient DP-SGD Mechanism for Large Scale NLP Models

URL: http://arxiv.org/abs/2107.14586v1
Date: Wed, 14 Jul 2021 15:23:27 GMT
Title: An Efficient DP-SGD Mechanism for Large Scale NLP Models
Authors: Christophe Dupuy, Radhika Arava, Rahul Gupta, Anna Rumshisky
Abstract summary: Data used to train Natural Language Understanding (NLU) models may contain private information such as addresses or phone numbers. It is desirable that underlying models do not expose private information contained in the training data. Differentially Private Gradient Descent (DP-SGD) has been proposed as a mechanism to build privacy-preserving models.
Score: 28.180412581994485
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in deep learning have drastically improved performance on many Natural Language Understanding (NLU) tasks. However, the data used to train NLU models may contain private information such as addresses or phone numbers, particularly when drawn from human subjects. It is desirable that underlying models do not expose private information contained in the training data. Differentially Private Stochastic Gradient Descent (DP-SGD) has been proposed as a mechanism to build privacy-preserving models. However, DP-SGD can be prohibitively slow to train. In this work, we propose a more efficient DP-SGD for training using a GPU infrastructure and apply it to fine-tuning models based on LSTM and transformer architectures. We report faster training times, alongside accuracy, theoretical privacy guarantees and success of Membership inference attacks for our models and observe that fine-tuning with proposed variant of DP-SGD can yield competitive models without significant degradation in training time and improvement in privacy protection. We also make observations such as looser theoretical $\epsilon, \delta$ can translate into significant practical privacy gains.

Related papers

What is the Cost of Differential Privacy for Deep Learning-Based Trajectory Generation? [20.540761983235868]
We show how DP-SGD affects the utility of state-of-the-art generative models.<n>We propose a novel DP mechanism for conditional generation that provides formal guarantees and assess its impact on utility.<n>Our results show that DP-SGD significantly impacts performance, although some utility remains if the datasets is sufficiently large.
arXiv Detail & Related papers (2025-06-11T00:59:52Z)
Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models. DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage. We develop a novel DP continual pre-training strategy using only 10% of public data. Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z)
Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner. We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z)
Sparsity-Preserving Differentially Private Training of Large Embedding Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent. Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z)
Equivariant Differentially Private Deep Learning: Why DP-SGD Needs Sparser Models [7.49320945341034]
We show that small and efficient architecture design can outperform current state-of-the-art models with substantially lower computational requirements. Our results are a step towards efficient model architectures that make optimal use of their parameters.
arXiv Detail & Related papers (2023-01-30T17:43:47Z)
DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling [23.8561225168394]
differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning. A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the gradient descent (SGD) commonly used for training. We propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core of DP-SGD.
arXiv Detail & Related papers (2022-10-18T07:03:14Z)
Large Scale Transfer Learning for Differentially Private Image Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z)
Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z)
Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text. We show that this performance drop can be mitigated with the use of large pretrained models. We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z)
DPlis: Boosting Utility of Differentially Private Deep Learning via Randomized Smoothing [0.0]
We propose DPlis--Differentially Private Learning wIth Smoothing. We show that DPlis can effectively boost model quality and training stability under a given privacy budget.
arXiv Detail & Related papers (2021-03-02T06:33:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.