Differentially Private Representation Learning via Image Captioning
- URL: http://arxiv.org/abs/2403.02506v1
- Date: Mon, 4 Mar 2024 21:52:25 GMT
- Title: Differentially Private Representation Learning via Image Captioning
- Authors: Tom Sander, Yaodong Yu, Maziar Sanjabi, Alain Durmus, Yi Ma, Kamalika
Chaudhuri, Chuan Guo
- Abstract summary: We show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets.
Our work challenges the prevailing sentiment that high-utility DP representation learning cannot be achieved by training from scratch.
- Score: 53.67133628775955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentially private (DP) machine learning is considered the gold-standard
solution for training a model from sensitive data while still preserving
privacy. However, a major barrier to achieving this ideal is its sub-optimal
privacy-accuracy trade-off, which is particularly visible in DP representation
learning. Specifically, it has been shown that under modest privacy budgets,
most models learn representations that are not significantly better than
hand-crafted features. In this work, we show that effective DP representation
learning can be done via image captioning and scaling up to internet-scale
multimodal datasets. Through a series of engineering tricks, we successfully
train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch
using a reasonable amount of computation, and obtaining unprecedented
high-quality image features that can be used in a variety of downstream vision
and vision-language tasks. For example, under a privacy budget of
$\varepsilon=8$, a linear classifier trained on top of learned DP-Cap features
attains 65.8% accuracy on ImageNet-1K, considerably improving the previous SOTA
of 56.5%. Our work challenges the prevailing sentiment that high-utility DP
representation learning cannot be achieved by training from scratch.
Related papers
- Pre-training Differentially Private Models with Limited Public Data [58.945400707033016]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We propose a novel DP continual pre-training strategy using only 10% of public data.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot
Vision-Language Tasks [60.46473247205654]
Using large-scale unsupervised unimodal models as pre-training can enhance the zero-shot performance of image-text pair models.
Our experiments show that unimodal pre-training outperforms state-of-the-art CLIP-based models.
arXiv Detail & Related papers (2023-06-07T18:26:22Z) - Learning Differentially Private Probabilistic Models for
Privacy-Preserving Image Generation [67.47979276739144]
We propose learning differentially private probabilistic models to generate high-resolution images with differential privacy guarantee.
Our approach can generate images up to 256x256 with remarkable visual quality and data utility.
arXiv Detail & Related papers (2023-05-18T02:51:17Z) - On the Efficacy of Differentially Private Few-shot Image Classification [40.49270725252068]
In many applications including personalization and federated learning, it is crucial to perform well in the few-shot setting.
We show how the accuracy and vulnerability to attack of few-shot DP image classification models are affected as the number of shots per class, privacy level, model architecture, downstream dataset, and subset of learnable parameters in the model vary.
arXiv Detail & Related papers (2023-02-02T16:16:25Z) - Learning to Generate Image Embeddings with User-level Differential
Privacy [31.797461992234457]
DP-FedEmb is a variant of federated learning algorithms with per-user sensitivity control and noise addition.
We show it is possible to achieve strong user-level DP guarantees of $epsilon4$ while controlling the utility drop within 5%, when millions of users can participate in training.
arXiv Detail & Related papers (2022-11-20T01:59:37Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Toward Training at ImageNet Scale with Differential Privacy [19.139956067438995]
Differential privacy (DP) is the de facto standard for training machine learning (ML) models.
ImageNet image classification is a poster example of an ML task that is very challenging to resolve accurately with DP.
This paper shares initial lessons from our effort, in the hope that it will inspire and inform other researchers to explore DP training at scale.
arXiv Detail & Related papers (2022-01-28T18:48:18Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - DPlis: Boosting Utility of Differentially Private Deep Learning via
Randomized Smoothing [0.0]
We propose DPlis--Differentially Private Learning wIth Smoothing.
We show that DPlis can effectively boost model quality and training stability under a given privacy budget.
arXiv Detail & Related papers (2021-03-02T06:33:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.