Private Fine-tuning of Large Language Models with Zeroth-order
Optimization
- URL: http://arxiv.org/abs/2401.04343v1
- Date: Tue, 9 Jan 2024 03:53:59 GMT
- Title: Private Fine-tuning of Large Language Models with Zeroth-order
Optimization
- Authors: Xinyu Tang, Ashwinee Panda, Milad Nasr, Saeed Mahloujifar, Prateek
Mittal
- Abstract summary: We introduce DP-ZO, a new method for fine-tuning large language models that preserves the privacy of training data by privatizing zeroth-order optimization.
We show that DP-ZO exhibits just $1.86%$ performance degradation due to privacy at $ (1,10-5)$-DP when fine-tuning OPT-66B on 1000 training samples from SQuAD.
- Score: 54.24600476755372
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Fine-tuning large pretrained models on private datasets may run the risk of
violating privacy. Differential privacy is a framework for mitigating privacy
risks by enforcing algorithmic stability. DP-SGD enables training models with
private data in a privacy-preserving manner, but raises new obstacles in the
form of performance loss and significant engineering challenges. We introduce
DP-ZO, a new method for fine-tuning large language models that preserves the
privacy of training data by privatizing zeroth-order optimization. A key
insight into the design of our method is that the direction of the gradient in
SPSA, the zeroth-order algorithm we use, is always random and the only
information that depends on private data is the step size, i.e., a scalar.
Therefore, we only need to privatize the scalar step size, which is
memory-efficient. DP-ZO, which can be instantiated with either Laplace or
Gaussian noise, provides a strong privacy-utility trade-off across different
tasks, and model sizes, under conservative privacy budgets. One noteworthy
result is that DP-ZO exhibits just $1.86\%$ performance degradation due to
privacy at $(1,10^{-5})$-DP when fine-tuning OPT-66B on 1000 training samples
from SQuAD.
Related papers
- Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - TAN Without a Burn: Scaling Laws of DP-SGD [70.7364032297978]
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently.
We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.
We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
arXiv Detail & Related papers (2022-10-07T08:44:35Z) - Fine-Tuning with Differential Privacy Necessitates an Additional
Hyperparameter Search [38.83524780461911]
We show how carefully selecting the layers being fine-tuned in the pretrained neural network allows us to establish new state-of-the-art tradeoffs between privacy and accuracy.
We achieve 77.9% accuracy for $(varepsilon, delta)= (2, 10-5)$ on CIFAR-100 for a model pretrained on ImageNet.
arXiv Detail & Related papers (2022-10-05T11:32:49Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Pre-trained Perceptual Features Improve Differentially Private Image
Generation [8.659595986100738]
Training even moderately-sized generative models with differentially-private descent gradient (DP-SGD) is difficult.
We advocate building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation.
Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models.
arXiv Detail & Related papers (2022-05-25T16:46:01Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.