DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning
- URL: http://arxiv.org/abs/2403.14421v3
- Date: Mon, 13 May 2024 14:57:34 GMT
- Title: DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning
- Authors: Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo,
- Abstract summary: We develop the first differentially private (DP) retrieval-augmented generation algorithm.
It is capable of generating high-quality image samples while providing provable privacy guarantees.
- Score: 38.697798191642136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guarantees. Specifically, we assume access to a text-to-image diffusion model trained on a small amount of public data, and design a DP retrieval mechanism to augment the text prompt with samples retrieved from a private retrieval dataset. Our \emph{differentially private retrieval-augmented diffusion model} (DP-RDM) requires no fine-tuning on the retrieval dataset to adapt to another domain, and can use state-of-the-art generative models to generate high-quality image samples while satisfying rigorous DP guarantees. For instance, when evaluated on MS-COCO, our DP-RDM can generate samples with a privacy budget of $\epsilon=10$, while providing a $3.5$ point improvement in FID compared to public-only retrieval for up to $10,000$ queries.
Related papers
- Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training [51.87027943520492]
We present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities.
Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities.
arXiv Detail & Related papers (2024-06-10T06:26:03Z) - Gradient Inversion of Federated Diffusion Models [4.1355611383748005]
Diffusion models are becoming defector generative models, which generate exceptionally high-resolution image data.
In this paper, we study the privacy risk of gradient inversion attacks.
We propose a triple-optimization GIDM+ that coordinates the optimization of the unknown data.
arXiv Detail & Related papers (2024-05-30T18:00:03Z) - CPR: Retrieval Augmented Generation for Copyright Protection [101.15323302062562]
We introduce CopyProtected generation with Retrieval (CPR), a new method for RAG with strong copyright protection guarantees.
CPR allows to condition the output of diffusion models on a set of retrieved images.
We prove that CPR satisfies Near Access Freeness (NAF) which bounds the amount of information an attacker may be able to extract from the generated images.
arXiv Detail & Related papers (2024-03-27T18:09:55Z) - LLM-based Privacy Data Augmentation Guided by Knowledge Distillation
with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost.
We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z) - PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining [13.823621924706348]
Differential Privacy (DP) image data synthesis allows organizations to share and utilize synthetic images without privacy concerns.
Previous methods incorporate the advanced techniques of generative models and pre-training on a public dataset to produce exceptional DP image data.
This paper proposes a novel DP image synthesis method, termed PRIVIMAGE, which meticulously selects pre-training data.
arXiv Detail & Related papers (2023-10-19T14:04:53Z) - Preconditioned Score-based Generative Models [49.88840603798831]
An intuitive acceleration method is to reduce the sampling iterations which however causes severe performance degradation.
We propose a model-agnostic bfem preconditioned diffusion sampling (PDS) method that leverages matrix preconditioning to alleviate the aforementioned problem.
PDS alters the sampling process of a vanilla SGM at marginal extra computation cost, and without model retraining.
arXiv Detail & Related papers (2023-02-13T16:30:53Z) - Unsupervised Representation Learning from Pre-trained Diffusion
Probabilistic Models [83.75414370493289]
Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of generating high-quality image samples.
Diff-AE have been proposed to explore DPMs for representation learning via autoencoding.
We propose textbfPre-trained textbfAutotextbfEncoding (textbfPDAE) to adapt existing pre-trained DPMs to the decoders for image reconstruction.
arXiv Detail & Related papers (2022-12-26T02:37:38Z) - Pre-trained Perceptual Features Improve Differentially Private Image
Generation [8.659595986100738]
Training even moderately-sized generative models with differentially-private descent gradient (DP-SGD) is difficult.
We advocate building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation.
Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models.
arXiv Detail & Related papers (2022-05-25T16:46:01Z) - Differentially Private Representation for NLP: Formal Guarantee and An
Empirical Study on Privacy and Fairness [38.90014773292902]
It has been demonstrated that hidden representation learned by a deep model can encode private information of the input.
We propose Differentially Private Neural Representation (DPNR) to preserve the privacy of the extracted representation from text.
arXiv Detail & Related papers (2020-10-03T05:58:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.