Differentially Private Representation Learning via Image Captioning
- URL: http://arxiv.org/abs/2403.02506v2
- Date: Wed, 30 Oct 2024 14:55:58 GMT
- Title: Differentially Private Representation Learning via Image Captioning
- Authors: Tom Sander, Yaodong Yu, Maziar Sanjabi, Alain Durmus, Yi Ma, Kamalika Chaudhuri, Chuan Guo,
- Abstract summary: We show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets.
We successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation.
- Score: 51.45515227171524
- License:
- Abstract: Differentially private (DP) machine learning is considered the gold-standard solution for training a model from sensitive data while still preserving privacy. However, a major barrier to achieving this ideal is its sub-optimal privacy-accuracy trade-off, which is particularly visible in DP representation learning. Specifically, it has been shown that under modest privacy budgets, most models learn representations that are not significantly better than hand-crafted features. In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets. Through a series of engineering tricks, we successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation, and obtaining unprecedented high-quality image features that can be used in a variety of downstream vision and vision-language tasks. For example, under a privacy budget of $\varepsilon=8$ for the LAION dataset, a linear classifier trained on top of learned DP-Cap features attains $65.8\%$ accuracy on ImageNet-1K, considerably improving the previous SOTA of $56.5\%$.
Related papers
- Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.
We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - ViP: A Differentially Private Foundation Model for Computer Vision [40.104959284968096]
We propose a recipe to train foundation vision models with differential privacy (DP) guarantee.
ViP achieves a (non-private) linear probing accuracy of $55.7%$ on ImageNet.
Our result suggests that scaling to internet-scale data can be practical for private learning.
arXiv Detail & Related papers (2023-06-15T04:06:24Z) - Differentially Private Image Classification by Learning Priors from
Random Processes [48.0766422536737]
In privacy-preserving machine learning, differentially private gradient descent (DP-SGD) performs worse than SGD due to per-sample gradient clipping and noise addition.
A recent focus in private learning research is improving the performance of DP-SGD on private data by incorporating priors that are learned on real-world public data.
In this work, we explore how we can improve the privacy-utility tradeoff of DP-SGD by learning priors from images generated by random processes and transferring these priors to private data.
arXiv Detail & Related papers (2023-06-08T04:14:32Z) - Learning to Generate Image Embeddings with User-level Differential
Privacy [31.797461992234457]
DP-FedEmb is a variant of federated learning algorithms with per-user sensitivity control and noise addition.
We show it is possible to achieve strong user-level DP guarantees of $epsilon4$ while controlling the utility drop within 5%, when millions of users can participate in training.
arXiv Detail & Related papers (2022-11-20T01:59:37Z) - TAN Without a Burn: Scaling Laws of DP-SGD [70.7364032297978]
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently.
We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.
We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
arXiv Detail & Related papers (2022-10-07T08:44:35Z) - Pre-trained Perceptual Features Improve Differentially Private Image
Generation [8.659595986100738]
Training even moderately-sized generative models with differentially-private descent gradient (DP-SGD) is difficult.
We advocate building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation.
Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models.
arXiv Detail & Related papers (2022-05-25T16:46:01Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Toward Training at ImageNet Scale with Differential Privacy [19.139956067438995]
Differential privacy (DP) is the de facto standard for training machine learning (ML) models.
ImageNet image classification is a poster example of an ML task that is very challenging to resolve accurately with DP.
This paper shares initial lessons from our effort, in the hope that it will inspire and inform other researchers to explore DP training at scale.
arXiv Detail & Related papers (2022-01-28T18:48:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.