Pre-training Differentially Private Models with Limited Public Data
- URL: http://arxiv.org/abs/2402.18752v1
- Date: Wed, 28 Feb 2024 23:26:27 GMT
- Title: Pre-training Differentially Private Models with Limited Public Data
- Authors: Zhiqi Bu, Xinwei Zhang, Mingyi Hong, Sheng Zha, George Karypis
- Abstract summary: differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We propose a novel DP continual pre-training strategy using only 10% of public data.
- Score: 58.945400707033016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The superior performance of large foundation models relies on the use of
massive amounts of high-quality data, which often contain sensitive, private
and copyrighted material that requires formal protection. While differential
privacy (DP) is a prominent method to gauge the degree of security provided to
the models, its application is commonly limited to the model fine-tuning stage,
due to the performance degradation when applying DP during the pre-training
stage. Consequently, DP is yet not capable of protecting a substantial portion
of the data used during the initial pre-training process.
In this work, we first provide a theoretical understanding of the efficacy of
DP training by analyzing the per-iteration loss improvement. We make a key
observation that DP optimizers' performance degradation can be significantly
mitigated by the use of limited public data, which leads to a novel DP
continual pre-training strategy. Empirically, using only 10\% of public data,
our strategy can achieve DP accuracy of 41.5\% on ImageNet-21k (with
$\epsilon=8$), as well as non-DP accuracy of 55.7\% and and 60.0\% on
downstream tasks Places365 and iNaturalist-2021, respectively, on par with
state-of-the-art standard pre-training and substantially outperforming existing
DP pre-trained models.
Related papers
- Differentially Private Fine-Tuning of Diffusion Models [22.454127503937883]
The integration of Differential Privacy with diffusion models (DMs) presents a promising yet challenging frontier.
Recent developments in this field have highlighted the potential for generating high-quality synthetic data by pre-training on public data.
We propose a strategy optimized for private diffusion models, which minimizes the number of trainable parameters to enhance the privacy-utility trade-off.
arXiv Detail & Related papers (2024-06-03T14:18:04Z) - Differentially Private Representation Learning via Image Captioning [53.67133628775955]
We show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets.
Our work challenges the prevailing sentiment that high-utility DP representation learning cannot be achieved by training from scratch.
arXiv Detail & Related papers (2024-03-04T21:52:25Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Unlocking High-Accuracy Differentially Private Image Classification
through Scale [45.93988209606857]
Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points.
Previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks.
We demonstrate that DP-SGD on over- parameterized models can perform significantly better than previously thought.
arXiv Detail & Related papers (2022-04-28T17:10:56Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - An Efficient DP-SGD Mechanism for Large Scale NLP Models [28.180412581994485]
Data used to train Natural Language Understanding (NLU) models may contain private information such as addresses or phone numbers.
It is desirable that underlying models do not expose private information contained in the training data.
Differentially Private Gradient Descent (DP-SGD) has been proposed as a mechanism to build privacy-preserving models.
arXiv Detail & Related papers (2021-07-14T15:23:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.