Differentially Private Bias-Term Fine-tuning of Foundation Models
- URL: http://arxiv.org/abs/2210.00036v3
- Date: Wed, 19 Jun 2024 02:40:18 GMT
- Title: Differentially Private Bias-Term Fine-tuning of Foundation Models
- Authors: Zhiqi Bu, Yu-Xiang Wang, Sheng Zha, George Karypis,
- Abstract summary: We study the problem of differentially private (DP) fine-tuning of large pre-trained models.
We propose DP-BiTFiT, which matches the state-of-the-art accuracy for DP algorithms and the efficiency of the standard BiTFiT.
On a wide range of tasks, DP-BiTFiT is 230X faster and uses 28X less memory than DP full fine-tuning.
- Score: 36.55810474925956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of differentially private (DP) fine-tuning of large pre-trained models -- a recent privacy-preserving approach suitable for solving downstream tasks with sensitive data. Existing work has demonstrated that high accuracy is possible under strong privacy constraint, yet requires significant computational overhead or modifications to the network architecture. We propose differentially private bias-term fine-tuning (DP-BiTFiT), which matches the state-of-the-art accuracy for DP algorithms and the efficiency of the standard BiTFiT. DP-BiTFiT is model agnostic (not modifying the network architecture), parameter efficient (only training about 0.1% of the parameters), and computation efficient (almost removing the overhead caused by DP, in both the time and space complexity). On a wide range of tasks, DP-BiTFiT is 2~30X faster and uses 2~8X less memory than DP full fine-tuning, even faster than the standard full fine-tuning. This amazing efficiency enables us to conduct DP fine-tuning on language and vision tasks with long-sequence texts and high-resolution images, which were computationally difficult using existing methods. We open-source our code at FastDP (https://github.com/awslabs/fast-differential-privacy).
Related papers
- DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction [57.83978915843095]
This paper introduces DiSK, a novel framework designed to significantly enhance the performance of differentially private gradients.
To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands.
arXiv Detail & Related papers (2024-10-04T19:30:39Z) - Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.
We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Zero redundancy distributed learning with differential privacy [26.89679585840689]
We develop a new systematic solution, DP-ZeRO, to scale up the trainable DP model size.
Our DP-ZeRO has the potential to train models with arbitrary size and is evaluated on the world's largest DP models.
arXiv Detail & Related papers (2023-11-20T14:58:56Z) - Differentially Private Optimization on Large Model at Small Cost [39.93710312222771]
Differentially private (DP) optimization is the standard paradigm to learn large neural networks that are accurate and privacy-preserving.
Existing DP implementations are 2-1000X more costly in time and space complexity than the standard (non-private) training.
We develop a novel Book-Keeping (BK) technique that implements existing DPs (thus achieving the same accuracy) with a substantial improvement on the computational cost.
arXiv Detail & Related papers (2022-09-30T18:38:53Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Automatic Clipping: Differentially Private Deep Learning Made Easier and
Stronger [39.93710312222771]
Per-example clipping is a key algorithmic step that enables practical differential private (DP) training for deep learning models.
We propose an easy-to-use replacement, called automatic clipping, that eliminates the need to tune R for any DPs.
arXiv Detail & Related papers (2022-06-14T19:49:44Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - Fast and Memory Efficient Differentially Private-SGD via JL Projections [29.37156662314245]
DP-SGD is the only known algorithm for private training of large scale neural networks.
We present a new framework to design differentially privates called DP-SGD-JL and DP-Adam-JL.
arXiv Detail & Related papers (2021-02-05T06:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.