Differentially Private Fine-Tuning of Diffusion Models
- URL: http://arxiv.org/abs/2406.01355v1
- Date: Mon, 3 Jun 2024 14:18:04 GMT
- Title: Differentially Private Fine-Tuning of Diffusion Models
- Authors: Yu-Lin Tsai, Yizhe Li, Zekai Chen, Po-Yu Chen, Chia-Mu Yu, Xuebin Ren, Francois Buet-Golfouse,
- Abstract summary: The integration of Differential Privacy with diffusion models (DMs) presents a promising yet challenging frontier.
Recent developments in this field have highlighted the potential for generating high-quality synthetic data by pre-training on public data.
We propose a strategy optimized for private diffusion models, which minimizes the number of trainable parameters to enhance the privacy-utility trade-off.
- Score: 22.454127503937883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of Differential Privacy (DP) with diffusion models (DMs) presents a promising yet challenging frontier, particularly due to the substantial memorization capabilities of DMs that pose significant privacy risks. Differential privacy offers a rigorous framework for safeguarding individual data points during model training, with Differential Privacy Stochastic Gradient Descent (DP-SGD) being a prominent implementation. Diffusion method decomposes image generation into iterative steps, theoretically aligning well with DP's incremental noise addition. Despite the natural fit, the unique architecture of DMs necessitates tailored approaches to effectively balance privacy-utility trade-off. Recent developments in this field have highlighted the potential for generating high-quality synthetic data by pre-training on public data (i.e., ImageNet) and fine-tuning on private data, however, there is a pronounced gap in research on optimizing the trade-offs involved in DP settings, particularly concerning parameter efficiency and model scalability. Our work addresses this by proposing a parameter-efficient fine-tuning strategy optimized for private diffusion models, which minimizes the number of trainable parameters to enhance the privacy-utility trade-off. We empirically demonstrate that our method achieves state-of-the-art performance in DP synthesis, significantly surpassing previous benchmarks on widely studied datasets (e.g., with only 0.47M trainable parameters, achieving a more than 35% improvement over the previous state-of-the-art with a small privacy budget on the CelebA-64 dataset). Anonymous codes available at https://anonymous.4open.science/r/DP-LORA-F02F.
Related papers
- Quantifying and Mitigating Privacy Risks for Tabular Generative Models [13.153278585144355]
Synthetic data from generative models emerges as the privacy-preserving data-sharing solution.
We propose DP-TLDM, Differentially Private Tabular Latent Diffusion Model.
We show that DP-TLDM improves the synthetic quality by an average of 35% in data resemblance, 15% in the utility for downstream tasks, and 50% in data discriminability.
arXiv Detail & Related papers (2024-03-12T17:27:49Z) - Pre-training Differentially Private Models with Limited Public Data [58.945400707033016]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We propose a novel DP continual pre-training strategy using only 10% of public data.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - Private Fine-tuning of Large Language Models with Zeroth-order
Optimization [54.24600476755372]
We introduce DP-ZO, a new method for fine-tuning large language models that preserves the privacy of training data by privatizing zeroth-order optimization.
We show that DP-ZO exhibits just $1.86%$ performance degradation due to privacy at $ (1,10-5)$-DP when fine-tuning OPT-66B on 1000 training samples from SQuAD.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - DPGOMI: Differentially Private Data Publishing with Gaussian Optimized
Model Inversion [8.204115285718437]
We propose Differentially Private Data Publishing with Gaussian Optimized Model Inversion (DPGOMI) to address this issue.
Our approach involves mapping private data to the latent space using a public generator, followed by a lower-dimensional DP-GAN with better convergence properties.
Our results show that DPGOMI outperforms the standard DP-GAN method in terms of Inception Score, Freche't Inception Distance, and classification performance.
arXiv Detail & Related papers (2023-10-06T18:46:22Z) - Exploring Machine Learning Privacy/Utility trade-off from a
hyperparameters Lens [10.727571921061024]
Differentially Private Descent Gradient (DPSGD) is the state-of-the-art method to train privacy-preserving models.
With a drop-in replacement of the activation function, we achieve new state-of-the-art accuracy.
arXiv Detail & Related papers (2023-03-03T09:59:42Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - PEARL: Data Synthesis via Private Embeddings and Adversarial
Reconstruction Learning [1.8692254863855962]
We propose a new framework of data using deep generative models in a differentially private manner.
Within our framework, sensitive data are sanitized with rigorous privacy guarantees in a one-shot fashion.
Our proposal has theoretical guarantees of performance, and empirical evaluations on multiple datasets show that our approach outperforms other methods at reasonable levels of privacy.
arXiv Detail & Related papers (2021-06-08T18:00:01Z) - DataLens: Scalable Privacy Preserving Training via Gradient Compression
and Aggregation [15.63770709526671]
We propose a scalable privacy-preserving generative model DATALENS.
We show that, DATALENS significantly outperforms other baseline DP generative models.
We adapt the proposed TOPAGG approach, which is one of the key building blocks in DATALENS, to DP SGD training.
arXiv Detail & Related papers (2021-03-20T06:14:19Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.