Related papers: RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models

RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models

URL: http://arxiv.org/abs/2502.12794v1
Date: Tue, 18 Feb 2025 11:56:51 GMT
Title: RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models
Authors: Tanqiu Jiang, Changjiang Li, Fenglong Ma, Ting Wang,
Abstract summary: We present RAPID: Retrieval Augmented PrIvate Diffusion model.<n>It is a novel approach that integrates retrieval augmented generation into DPDM training.<n>It significantly outperforms state-of-the-art approaches by large margins in generative quality, memory footprint, and inference cost.
Score: 26.66607257183987
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Differentially private diffusion models (DPDMs) harness the remarkable generative capabilities of diffusion models while enforcing differential privacy (DP) for sensitive data. However, existing DPDM training approaches often suffer from significant utility loss, large memory footprint, and expensive inference cost, impeding their practical uses. To overcome such limitations, we present RAPID: Retrieval Augmented PrIvate Diffusion model, a novel approach that integrates retrieval augmented generation (RAG) into DPDM training. Specifically, RAPID leverages available public data to build a knowledge base of sample trajectories; when training the diffusion model on private data, RAPID computes the early sampling steps as queries, retrieves similar trajectories from the knowledge base as surrogates, and focuses on training the later sampling steps in a differentially private manner. Extensive evaluation using benchmark datasets and models demonstrates that, with the same privacy guarantee, RAPID significantly outperforms state-of-the-art approaches by large margins in generative quality, memory footprint, and inference cost, suggesting that retrieval-augmented DP training represents a promising direction for developing future privacy-preserving generative models. The code is available at: https://github.com/TanqiuJiang/RAPID

Related papers

Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models [33.09663675904689]
We investigate efficient diffusion training from the perspective of dataset pruning. Inspired by the principles of data-efficient training for generative models such as generative adversarial networks (GANs), we first extend the data selection scheme used in GANs to DM training. To further improve the generation performance, we employ a class-wise reweighting approach.
arXiv Detail & Related papers (2024-09-27T20:21:19Z)
Revisiting Privacy-Utility Trade-off for DP Training with Pre-existing Knowledge [40.44144653519249]
We propose a generic differential privacy framework with heterogeneous noise (DP-Hero)<n>Atop DP-Hero, we instantiate a heterogeneous version of DP-SGD, and further extend it to federated training.<n>We conduct comprehensive experiments to verify and explain the effectiveness of the proposed DP-Hero, showing improved training accuracy compared with state-of-the-art works.
arXiv Detail & Related papers (2024-09-05T08:40:54Z)
Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD. It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation. We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z)
Differentially Private Fine-Tuning of Diffusion Models [22.454127503937883]
The integration of Differential Privacy with diffusion models (DMs) presents a promising yet challenging frontier. Recent developments in this field have highlighted the potential for generating high-quality synthetic data by pre-training on public data. We propose a strategy optimized for private diffusion models, which minimizes the number of trainable parameters to enhance the privacy-utility trade-off.
arXiv Detail & Related papers (2024-06-03T14:18:04Z)
Gradient Inversion of Federated Diffusion Models [4.1355611383748005]
Diffusion models are becoming defector generative models, which generate exceptionally high-resolution image data. In this paper, we study the privacy risk of gradient inversion attacks. We propose a triple-optimization GIDM+ that coordinates the optimization of the unknown data.
arXiv Detail & Related papers (2024-05-30T18:00:03Z)
Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models. DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage. We develop a novel DP continual pre-training strategy using only 10% of public data. Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z)
LLM-based Privacy Data Augmentation Guided by Knowledge Distillation with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost. We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z)
Arbitrary Decisions are a Hidden Cost of Differentially Private Training [7.560688419767116]
Mechanisms used in machine learning often aim to guarantee differential privacy (DP) during model training. Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data. For a given input example, the output predicted by equally-private models depends on the randomness used in training.
arXiv Detail & Related papers (2023-02-28T12:13:43Z)
Differentially Private Diffusion Models [46.46256537222917]
We build on the recent success of diffusion models (DMs) and introduce Differentially Private Diffusion Models (DPDMs) We propose noise multiplicity, a powerful modification of DP-SGD tailored to the training of DMs. We validate our novel DPDMs on image generation benchmarks and achieve state-of-the-art performance in all experiments.
arXiv Detail & Related papers (2022-10-18T15:20:47Z)
Large Scale Transfer Learning for Differentially Private Image Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z)
Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.