Differentially Private Synthetic Data via Foundation Model APIs 1: Images
- URL: http://arxiv.org/abs/2305.15560v3
- Date: Thu, 05 Dec 2024 07:43:25 GMT
- Title: Differentially Private Synthetic Data via Foundation Model APIs 1: Images
- Authors: Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Harsha Nori, Sergey Yekhanin,
- Abstract summary: We present a new framework called Private Evolution (PE) to solve this problem.
PE can match even state-of-the-art (SOTA) methods without any model training.
For example, on CIFAR10 we achieve FID = 7.9 with privacy cost epsilon = 0.67, significantly improving the previous SOTA from epsilon = 32.
- Score: 29.27468374365625
- License:
- Abstract: Generating differentially private (DP) synthetic data that closely resembles the original private data is a scalable way to mitigate privacy concerns in the current data-driven world. In contrast to current practices that train customized models for this task, we aim to generate DP Synthetic Data via APIs (DPSDA), where we treat foundation models as blackboxes and only utilize their inference APIs. Such API-based, training-free approaches are easier to deploy as exemplified by the recent surge in the number of API-based apps. These approaches can also leverage the power of large foundation models which are only accessible via their inference APIs. However, this comes with greater challenges due to strictly more restrictive model access and the need to protect privacy from the API provider. In this paper, we present a new framework called Private Evolution (PE) to solve this problem and show its initial promise on synthetic images. Surprisingly, PE can match or even outperform state-of-the-art (SOTA) methods without any model training. For example, on CIFAR10 (with ImageNet as the public data), we achieve FID <= 7.9 with privacy cost {\epsilon} = 0.67, significantly improving the previous SOTA from {\epsilon} = 32. We further demonstrate the promise of applying PE on large foundation models such as Stable Diffusion to tackle challenging private datasets with a small number of high-resolution images. The code and data are released at https://github.com/microsoft/DPSDA.
Related papers
- Is API Access to LLMs Useful for Generating Private Synthetic Tabular Data? [19.72500788849435]
Differentially private (DP) synthetic data is a versatile tool for enabling the analysis of private data.
Recent advancements in large language models (LLMs) have inspired a number of algorithm techniques for improving DP synthetic data generation.
One family of approaches uses DP finetuning on the foundation model weights; however, the model weights for state-of-the-art models may not be public.
arXiv Detail & Related papers (2025-02-10T15:23:52Z) - Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Model [13.28430346661924]
Differentially private (DP) synthetic data has become a key tool for unlocking the value of private data without compromising privacy.
Private Evolution (PE) has emerged as a promising method for generating DP synthetic data.
We show that simulators -- computer graphics-based image synthesis tools -- can also serve as effective APIs within the PE framework.
arXiv Detail & Related papers (2025-02-08T09:50:30Z) - Label Privacy in Split Learning for Large Models with Parameter-Efficient Training [51.28799334394279]
We search for a way to fine-tune models over an API while keeping the labels private.
We propose P$3$EFT, a multi-party split learning algorithm that takes advantage of existing PEFT properties to maintain privacy at a lower performance overhead.
arXiv Detail & Related papers (2024-12-21T15:32:03Z) - Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.
We empirically find that this training paradigm limits the one-step generation performance of consistency models.
We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z) - Differentially Private Synthetic Data via Foundation Model APIs 2: Text [56.13240830670327]
A lot of high-quality text data generated in the real world is private and cannot be shared or used freely due to privacy concerns.
We propose an augmented PE algorithm, named Aug-PE, that applies to the complex setting of text.
Our results demonstrate that Aug-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines.
arXiv Detail & Related papers (2024-03-04T05:57:50Z) - Pre-trained Perceptual Features Improve Differentially Private Image
Generation [8.659595986100738]
Training even moderately-sized generative models with differentially-private descent gradient (DP-SGD) is difficult.
We advocate building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation.
Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models.
arXiv Detail & Related papers (2022-05-25T16:46:01Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.