Learning to Generate Image Embeddings with User-level Differential
Privacy
- URL: http://arxiv.org/abs/2211.10844v2
- Date: Fri, 31 Mar 2023 22:55:16 GMT
- Title: Learning to Generate Image Embeddings with User-level Differential
Privacy
- Authors: Zheng Xu, Maxwell Collins, Yuxiao Wang, Liviu Panait, Sewoong Oh, Sean
Augenstein, Ting Liu, Florian Schroff, H. Brendan McMahan
- Abstract summary: DP-FedEmb is a variant of federated learning algorithms with per-user sensitivity control and noise addition.
We show it is possible to achieve strong user-level DP guarantees of $epsilon4$ while controlling the utility drop within 5%, when millions of users can participate in training.
- Score: 31.797461992234457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Small on-device models have been successfully trained with user-level
differential privacy (DP) for next word prediction and image classification
tasks in the past. However, existing methods can fail when directly applied to
learn embedding models using supervised training data with a large class space.
To achieve user-level DP for large image-to-embedding feature extractors, we
propose DP-FedEmb, a variant of federated learning algorithms with per-user
sensitivity control and noise addition, to train from user-partitioned data
centralized in the datacenter. DP-FedEmb combines virtual clients, partial
aggregation, private local fine-tuning, and public pretraining to achieve
strong privacy utility trade-offs. We apply DP-FedEmb to train image embedding
models for faces, landmarks and natural species, and demonstrate its superior
utility under same privacy budget on benchmark datasets DigiFace, EMNIST, GLD
and iNaturalist. We further illustrate it is possible to achieve strong
user-level DP guarantees of $\epsilon<4$ while controlling the utility drop
within 5%, when millions of users can participate in training.
Related papers
- Beyond the Mean: Differentially Private Prototypes for Private Transfer Learning [16.028575596905554]
We propose Differentially Private Prototype Learning (DPPL) as a new paradigm for private transfer learning.
DPPL generates prototypes that represent each private class in the embedding space and can be publicly released for inference.
We show that privacy-utility trade-offs can be further improved when leveraging the public data beyond pre-training of the encoder.
arXiv Detail & Related papers (2024-06-12T09:41:12Z) - Differentially Private Representation Learning via Image Captioning [51.45515227171524]
We show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets.
We successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation.
arXiv Detail & Related papers (2024-03-04T21:52:25Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - DP$^2$-VAE: Differentially Private Pre-trained Variational Autoencoders [26.658723213776632]
We propose DP$2$-VAE, a training mechanism for variational autoencoders (VAE) with provable DP guarantees and improved utility via emphpre-training on private data.
We conduct extensive experiments on image datasets to illustrate our superiority over baselines under various privacy budgets and evaluation metrics.
arXiv Detail & Related papers (2022-08-05T23:57:34Z) - Pre-trained Perceptual Features Improve Differentially Private Image
Generation [8.659595986100738]
Training even moderately-sized generative models with differentially-private descent gradient (DP-SGD) is difficult.
We advocate building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation.
Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models.
arXiv Detail & Related papers (2022-05-25T16:46:01Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - User-Level Privacy-Preserving Federated Learning: Analysis and
Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models.
From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs.
We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.