Differentially Private Diffusion Models
- URL: http://arxiv.org/abs/2210.09929v3
- Date: Sun, 31 Dec 2023 01:24:45 GMT
- Title: Differentially Private Diffusion Models
- Authors: Tim Dockhorn, Tianshi Cao, Arash Vahdat, Karsten Kreis
- Abstract summary: We build on the recent success of diffusion models (DMs) and introduce Differentially Private Diffusion Models (DPDMs)
We propose noise multiplicity, a powerful modification of DP-SGD tailored to the training of DMs.
We validate our novel DPDMs on image generation benchmarks and achieve state-of-the-art performance in all experiments.
- Score: 46.46256537222917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While modern machine learning models rely on increasingly large training
datasets, data is often limited in privacy-sensitive domains. Generative models
trained with differential privacy (DP) on sensitive data can sidestep this
challenge, providing access to synthetic data instead. We build on the recent
success of diffusion models (DMs) and introduce Differentially Private
Diffusion Models (DPDMs), which enforce privacy using differentially private
stochastic gradient descent (DP-SGD). We investigate the DM parameterization
and the sampling algorithm, which turn out to be crucial ingredients in DPDMs,
and propose noise multiplicity, a powerful modification of DP-SGD tailored to
the training of DMs. We validate our novel DPDMs on image generation benchmarks
and achieve state-of-the-art performance in all experiments. Moreover, on
standard benchmarks, classifiers trained on DPDM-generated synthetic data
perform on par with task-specific DP-SGD-trained classifiers, which has not
been demonstrated before for DP generative models. Project page and code:
https://nv-tlabs.github.io/DPDM.
Related papers
- Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models [33.09663675904689]
We investigate efficient diffusion training from the perspective of dataset pruning.
Inspired by the principles of data-efficient training for generative models such as generative adversarial networks (GANs), we first extend the data selection scheme used in GANs to DM training.
To further improve the generation performance, we employ a class-wise reweighting approach.
arXiv Detail & Related papers (2024-09-27T20:21:19Z) - Differentially Private Fine-Tuning of Diffusion Models [22.454127503937883]
The integration of Differential Privacy with diffusion models (DMs) presents a promising yet challenging frontier.
Recent developments in this field have highlighted the potential for generating high-quality synthetic data by pre-training on public data.
We propose a strategy optimized for private diffusion models, which minimizes the number of trainable parameters to enhance the privacy-utility trade-off.
arXiv Detail & Related papers (2024-06-03T14:18:04Z) - LLM-based Privacy Data Augmentation Guided by Knowledge Distillation
with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost.
We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z) - On the Efficacy of Differentially Private Few-shot Image Classification [40.49270725252068]
In many applications including personalization and federated learning, it is crucial to perform well in the few-shot setting.
We show how the accuracy and vulnerability to attack of few-shot DP image classification models are affected as the number of shots per class, privacy level, model architecture, downstream dataset, and subset of learnable parameters in the model vary.
arXiv Detail & Related papers (2023-02-02T16:16:25Z) - Private Ad Modeling with DP-SGD [58.670969449674395]
A well-known algorithm in privacy-preserving ML is differentially private gradient descent (DP-SGD)
In this work we apply DP-SGD to several ad modeling tasks including predicting click-through rates, conversion rates, and number of conversion events.
Our work is the first to empirically demonstrate that DP-SGD can provide both privacy and utility for ad modeling tasks.
arXiv Detail & Related papers (2022-11-21T22:51:16Z) - DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling [23.8561225168394]
differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning.
A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the gradient descent (SGD) commonly used for training.
We propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core of DP-SGD.
arXiv Detail & Related papers (2022-10-18T07:03:14Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - An Efficient DP-SGD Mechanism for Large Scale NLP Models [28.180412581994485]
Data used to train Natural Language Understanding (NLU) models may contain private information such as addresses or phone numbers.
It is desirable that underlying models do not expose private information contained in the training data.
Differentially Private Gradient Descent (DP-SGD) has been proposed as a mechanism to build privacy-preserving models.
arXiv Detail & Related papers (2021-07-14T15:23:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.