ViP: A Differentially Private Foundation Model for Computer Vision
- URL: http://arxiv.org/abs/2306.08842v2
- Date: Wed, 28 Jun 2023 22:24:33 GMT
- Title: ViP: A Differentially Private Foundation Model for Computer Vision
- Authors: Yaodong Yu and Maziar Sanjabi and Yi Ma and Kamalika Chaudhuri and
Chuan Guo
- Abstract summary: We propose a recipe to train foundation vision models with differential privacy (DP) guarantee.
ViP achieves a (non-private) linear probing accuracy of $55.7%$ on ImageNet.
Our result suggests that scaling to internet-scale data can be practical for private learning.
- Score: 40.104959284968096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial intelligence (AI) has seen a tremendous surge in capabilities
thanks to the use of foundation models trained on internet-scale data. On the
flip side, the uncurated nature of internet-scale data also poses significant
privacy and legal risks, as they often contain personal information or
copyrighted material that should not be trained on without permission. In this
work, we propose as a mitigation measure a recipe to train foundation vision
models with differential privacy (DP) guarantee. We identify masked
autoencoders as a suitable learning algorithm that aligns well with DP-SGD, and
train ViP -- a Vision transformer with differential Privacy -- under a strict
privacy budget of $\epsilon=8$ on the LAION400M dataset. We evaluate the
quality of representation learned by ViP using standard downstream vision
tasks; in particular, ViP achieves a (non-private) linear probing accuracy of
$55.7\%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trained
and evaluated on ImageNet). Our result suggests that scaling to internet-scale
data can be practical for private learning. Code is available at
\url{https://github.com/facebookresearch/ViP-MAE}.
Related papers
- Calibrating Practical Privacy Risks for Differentially Private Machine Learning [5.363664265121231]
We study the approaches that can lower the attacking success rate to allow for more flexible privacy budget settings in model training.
We find that by selectively suppressing privacy-sensitive features, we can achieve lower ASR values without compromising application-specific data utility.
arXiv Detail & Related papers (2024-10-30T03:52:01Z) - Differentially Private Representation Learning via Image Captioning [51.45515227171524]
We show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets.
We successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation.
arXiv Detail & Related papers (2024-03-04T21:52:25Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - Privacy-Preserved Neural Graph Similarity Learning [99.78599103903777]
We propose a novel Privacy-Preserving neural Graph Matching network model, named PPGM, for graph similarity learning.
To prevent reconstruction attacks, the proposed model does not communicate node-level representations between devices.
To alleviate the attacks to graph properties, the obfuscated features that contain information from both vectors are communicated.
arXiv Detail & Related papers (2022-10-21T04:38:25Z) - TAN Without a Burn: Scaling Laws of DP-SGD [70.7364032297978]
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently.
We decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.
We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in top-1 accuracy.
arXiv Detail & Related papers (2022-10-07T08:44:35Z) - Fine-Tuning with Differential Privacy Necessitates an Additional
Hyperparameter Search [38.83524780461911]
We show how carefully selecting the layers being fine-tuned in the pretrained neural network allows us to establish new state-of-the-art tradeoffs between privacy and accuracy.
We achieve 77.9% accuracy for $(varepsilon, delta)= (2, 10-5)$ on CIFAR-100 for a model pretrained on ImageNet.
arXiv Detail & Related papers (2022-10-05T11:32:49Z) - DP$^2$-VAE: Differentially Private Pre-trained Variational Autoencoders [26.658723213776632]
We propose DP$2$-VAE, a training mechanism for variational autoencoders (VAE) with provable DP guarantees and improved utility via emphpre-training on private data.
We conduct extensive experiments on image datasets to illustrate our superiority over baselines under various privacy budgets and evaluation metrics.
arXiv Detail & Related papers (2022-08-05T23:57:34Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Toward Training at ImageNet Scale with Differential Privacy [19.139956067438995]
Differential privacy (DP) is the de facto standard for training machine learning (ML) models.
ImageNet image classification is a poster example of an ML task that is very challenging to resolve accurately with DP.
This paper shares initial lessons from our effort, in the hope that it will inspire and inform other researchers to explore DP training at scale.
arXiv Detail & Related papers (2022-01-28T18:48:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.