Investigating Data Memorization in 3D Latent Diffusion Models for
Medical Image Synthesis
- URL: http://arxiv.org/abs/2307.01148v2
- Date: Thu, 6 Jul 2023 09:09:55 GMT
- Title: Investigating Data Memorization in 3D Latent Diffusion Models for
Medical Image Synthesis
- Authors: Salman Ul Hassan Dar, Arman Ghanaat, Jannik Kahmann, Isabelle Ayx,
Theano Papavassiliu, Stefan O. Schoenberg, Sandy Engelhardt
- Abstract summary: We assess the memorization capacity of 3D latent diffusion models on photon-counting coronary computed tomography angiography and knee magnetic resonance imaging datasets.
Our results suggest that such latent diffusion models indeed memorize training data, and there is a dire need for devising strategies to mitigate memorization.
- Score: 0.6382686594288781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative latent diffusion models have been established as state-of-the-art
in data generation. One promising application is generation of realistic
synthetic medical imaging data for open data sharing without compromising
patient privacy. Despite the promise, the capacity of such models to memorize
sensitive patient training data and synthesize samples showing high resemblance
to training data samples is relatively unexplored. Here, we assess the
memorization capacity of 3D latent diffusion models on photon-counting coronary
computed tomography angiography and knee magnetic resonance imaging datasets.
To detect potential memorization of training samples, we utilize
self-supervised models based on contrastive learning. Our results suggest that
such latent diffusion models indeed memorize training data, and there is a dire
need for devising strategies to mitigate memorization.
Related papers
- Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - Unconditional Latent Diffusion Models Memorize Patient Imaging Data: Implications for Openly Sharing Synthetic Data [2.1375651880073834]
generative AI models have been gaining traction for facilitating open-data sharing.
These models generate patient data copies instead of novel synthetic samples.
We train 2D and 3D latent diffusion models on CT, MR, and X-ray datasets for synthetic data generation.
arXiv Detail & Related papers (2024-02-01T22:58:21Z) - The Journey, Not the Destination: How Data Guides Diffusion Models [75.19694584942623]
Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity.
We propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions.
arXiv Detail & Related papers (2023-12-11T08:39:43Z) - Optimizing Sampling Patterns for Compressed Sensing MRI with Diffusion
Generative Models [75.52575380824051]
We present a learning method to optimize sub-sampling patterns for compressed sensing multi-coil MRI.
We use a single-step reconstruction based on the posterior mean estimate given by the diffusion model and the MRI measurement process.
Our method requires as few as five training images to learn effective sampling patterns.
arXiv Detail & Related papers (2023-06-05T22:09:06Z) - Quantifying Sample Anonymity in Score-Based Generative Models with
Adversarial Fingerprinting [3.8933108317492167]
Training diffusion models on private data and disseminating the models and weights rather than the raw dataset paves the way for innovative large-scale data-sharing strategies.
This paper introduces a method for estimating the upper bound of the probability of reproducing identifiable training images during the sampling process.
Our results show that privacy-breaching images are reproduced at sampling time if the models were trained without care.
arXiv Detail & Related papers (2023-06-02T08:37:38Z) - Extracting Training Data from Diffusion Models [77.11719063152027]
We show that diffusion models memorize individual images from their training data and emit them at generation time.
With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models.
We train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy.
arXiv Detail & Related papers (2023-01-30T18:53:09Z) - Medical Diffusion -- Denoising Diffusion Probabilistic Models for 3D
Medical Image Generation [0.6486409713123691]
We show that diffusion probabilistic models can synthesize high quality medical imaging data.
We provide quantitative measurements of their performance through a reader study with two medical experts.
We demonstrate that synthetic images can be used in a self-supervised pre-training and improve the performance of breast segmentation models when data is scarce.
arXiv Detail & Related papers (2022-11-07T08:37:48Z) - Brain Imaging Generation with Latent Diffusion Models [2.200720122706913]
In this study, we explore using Latent Diffusion Models to generate synthetic images from high-resolution 3D brain images.
We found that our models created realistic data, and we could use the conditioning variables to control the data generation effectively.
arXiv Detail & Related papers (2022-09-15T09:16:21Z) - Fast Unsupervised Brain Anomaly Detection and Segmentation with
Diffusion Models [1.6352599467675781]
We propose a method based on diffusion models to detect and segment anomalies in brain imaging.
Our diffusion models achieve competitive performance compared with autoregressive approaches across a series of experiments with 2D CT and MRI data.
arXiv Detail & Related papers (2022-06-07T17:30:43Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - Modelling the Distribution of 3D Brain MRI using a 2D Slice VAE [66.63629641650572]
We propose a method to model 3D MR brain volumes distribution by combining a 2D slice VAE with a Gaussian model that captures the relationships between slices.
We also introduce a novel evaluation method for generated volumes that quantifies how well their segmentations match those of true brain anatomy.
arXiv Detail & Related papers (2020-07-09T13:23:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.