DP-SGD for non-decomposable objective functions
- URL: http://arxiv.org/abs/2310.03104v1
- Date: Wed, 4 Oct 2023 18:48:16 GMT
- Title: DP-SGD for non-decomposable objective functions
- Authors: William Kong, Andr\'es Mu\~noz Medina and M\'onica Ribero
- Abstract summary: We develop a new variant for similarity based loss functions that manipulates gradients of the objective function in a novel way to obtain a senstivity of the summed gradient that is $O(1)$ for batch size $n$.
Our method's performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Unsupervised pre-training is a common step in developing computer vision
models and large language models. In this setting, the absence of labels
requires the use of similarity-based loss functions, such as contrastive loss,
that favor minimizing the distance between similar inputs and maximizing the
distance between distinct inputs. As privacy concerns mount, training these
models using differential privacy has become more important. However, due to
how inputs are generated for these losses, one of their undesirable properties
is that their $L_2$ sensitivity can grow with increasing batch size. This
property is particularly disadvantageous for differentially private training
methods, such as DP-SGD. To overcome this issue, we develop a new DP-SGD
variant for similarity based loss functions -- in particular the commonly used
contrastive loss -- that manipulates gradients of the objective function in a
novel way to obtain a senstivity of the summed gradient that is $O(1)$ for
batch size $n$. We test our DP-SGD variant on some preliminary CIFAR-10
pre-training and CIFAR-100 finetuning tasks and show that, in both tasks, our
method's performance comes close to that of a non-private model and generally
outperforms DP-SGD applied directly to the contrastive loss.
Related papers
- Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.
We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in
Private SGD [56.01810892677744]
We show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD.
We propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias.
arXiv Detail & Related papers (2023-08-23T09:20:41Z) - DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling [23.8561225168394]
differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning.
A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the gradient descent (SGD) commonly used for training.
We propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core of DP-SGD.
arXiv Detail & Related papers (2022-10-18T07:03:14Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - DP-FP: Differentially Private Forward Propagation for Large Models [2.062295244789704]
We show how to mitigate the performance drop by replacing the Differential Private Gradient Descent with a novel DP Forward-Propagation (DP-FP)
Our DP-FP achieves an average accuracy of 91.34% with privacy budgets less than 3, representing a 3.81% performance improvement over the state-of-the-art DP-SGD.
arXiv Detail & Related papers (2021-12-29T07:32:29Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - Differentially Private Variational Autoencoders with Term-wise Gradient
Aggregation [12.880889651679094]
We study how to learn variational autoencoders with a variety of divergences under differential privacy constraints.
We propose term-wise DP-SGD that crafts randomized gradients in two different ways tailored to the compositions of the loss terms.
arXiv Detail & Related papers (2020-06-19T16:12:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.