Privacy Distillation: Reducing Re-identification Risk of Multimodal
Diffusion Models
- URL: http://arxiv.org/abs/2306.01322v1
- Date: Fri, 2 Jun 2023 07:44:00 GMT
- Title: Privacy Distillation: Reducing Re-identification Risk of Multimodal
Diffusion Models
- Authors: Virginia Fernandez, Pedro Sanchez, Walter Hugo Lopez Pinaya, Grzegorz
Jacenk\'ow, Sotirios A. Tsaftaris, Jorge Cardoso
- Abstract summary: We introduce Privacy Distillation, a framework that allows a text-to-image generative model to teach another model without exposing it to identifiable data.
Our solution consists of (1) training a first diffusion model on real data (2) generating a synthetic dataset using this model and filtering it to exclude images with a re-identifiability risk (3) training a second diffusion model on the filtered synthetic data only.
- Score: 11.659461421660613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation in neural networks refers to compressing a large model
or dataset into a smaller version of itself. We introduce Privacy Distillation,
a framework that allows a text-to-image generative model to teach another model
without exposing it to identifiable data. Here, we are interested in the
privacy issue faced by a data provider who wishes to share their data via a
multimodal generative model. A question that immediately arises is ``How can a
data provider ensure that the generative model is not leaking identifiable
information about a patient?''. Our solution consists of (1) training a first
diffusion model on real data (2) generating a synthetic dataset using this
model and filtering it to exclude images with a re-identifiability risk (3)
training a second diffusion model on the filtered synthetic data only. We
showcase that datasets sampled from models trained with privacy distillation
can effectively reduce re-identification risk whilst maintaining downstream
performance.
Related papers
- Gradient Inversion of Federated Diffusion Models [4.1355611383748005]
Diffusion models are becoming defector generative models, which generate exceptionally high-resolution image data.
In this paper, we study the privacy risk of gradient inversion attacks.
We propose a triple-optimization GIDM+ that coordinates the optimization of the unknown data.
arXiv Detail & Related papers (2024-05-30T18:00:03Z) - Heat Death of Generative Models in Closed-Loop Learning [63.83608300361159]
We study the learning dynamics of generative models that are fed back their own produced content in addition to their original training dataset.
We show that, unless a sufficient amount of external data is introduced at each iteration, any non-trivial temperature leads the model to degenerate.
arXiv Detail & Related papers (2024-04-02T21:51:39Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative Privacy Risk [60.36852134501251]
We reveal a new privacy risk, Shake-to-Leak (S2L), that fine-tuning the pre-trained models with manipulated data can amplify the existing privacy risks.
In the worst case, S2L can amplify the state-of-the-art membership inference attack (MIA) on diffusion models by $5.4%$ AUC.
This discovery underscores that the privacy risk with diffusion models is even more severe than previously recognized.
arXiv Detail & Related papers (2024-03-14T14:48:37Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - Towards Few-Call Model Stealing via Active Self-Paced Knowledge
Distillation and Diffusion-Based Image Generation [33.60710287553274]
We propose to copy black-box classification models without having access to the original training data, the architecture, and the weights of the model.
We employ a novel active self-paced learning framework to make the most of the proxy data during distillation.
Our empirical results on two data sets confirm the superiority of our framework over two state-of-the-art methods in the few-call model extraction scenario.
arXiv Detail & Related papers (2023-09-29T19:09:27Z) - Phoenix: A Federated Generative Diffusion Model [6.09170287691728]
Training generative models on large centralized datasets can pose challenges in terms of data privacy, security, and accessibility.
This paper proposes a novel method for training a Denoising Diffusion Probabilistic Model (DDPM) across multiple data sources using Federated Learning (FL) techniques.
arXiv Detail & Related papers (2023-06-07T01:43:09Z) - Black-box Source-free Domain Adaptation via Two-stage Knowledge
Distillation [8.224874938178633]
Source-free domain adaptation aims to adapt deep neural networks using only pre-trained source models and target data.
accessing the source model still has a potential concern about leaking the source data, which reveals the patient's privacy.
We study the challenging but practical problem: black-box source-free domain adaptation where only the outputs of the source model and target data are available.
arXiv Detail & Related papers (2023-05-13T10:00:24Z) - Extracting Training Data from Diffusion Models [77.11719063152027]
We show that diffusion models memorize individual images from their training data and emit them at generation time.
With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models.
We train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy.
arXiv Detail & Related papers (2023-01-30T18:53:09Z) - Privacy-preserving Generative Framework Against Membership Inference
Attacks [10.791983671720882]
We design a privacy-preserving generative framework against membership inference attacks.
We first map the source data to the latent space through the VAE model to get the latent code, then perform noise process satisfying metric privacy on the latent code, and finally use the VAE model to reconstruct the synthetic data.
Our experimental evaluation demonstrates that the machine learning model trained with newly generated synthetic data can effectively resist membership inference attacks and still maintain high utility.
arXiv Detail & Related papers (2022-02-11T06:13:30Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.