Ambient Diffusion: Learning Clean Distributions from Corrupted Data
- URL: http://arxiv.org/abs/2305.19256v1
- Date: Tue, 30 May 2023 17:43:33 GMT
- Title: Ambient Diffusion: Learning Clean Distributions from Corrupted Data
- Authors: Giannis Daras, Kulin Shah, Yuval Dagan, Aravind Gollakota, Alexandros
G. Dimakis, Adam Klivans
- Abstract summary: We present the first diffusion-based framework that can learn an unknown distribution using only highly-corrupted samples.
Another benefit of our approach is the ability to train generative models that are less likely to memorize individual training samples.
- Score: 77.34772355241901
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the first diffusion-based framework that can learn an unknown
distribution using only highly-corrupted samples. This problem arises in
scientific applications where access to uncorrupted samples is impossible or
expensive to acquire. Another benefit of our approach is the ability to train
generative models that are less likely to memorize individual training samples
since they never observe clean training data. Our main idea is to introduce
additional measurement distortion during the diffusion process and require the
model to predict the original corrupted image from the further corrupted image.
We prove that our method leads to models that learn the conditional expectation
of the full uncorrupted image given this additional measurement corruption.
This holds for any corruption process that satisfies some technical conditions
(and in particular includes inpainting and compressed sensing). We train models
on standard benchmarks (CelebA, CIFAR-10 and AFHQ) and show that we can learn
the distribution even when all the training samples have $90\%$ of their pixels
missing. We also show that we can finetune foundation models on small corrupted
datasets (e.g. MRI scans with block corruptions) and learn the clean
distribution without memorizing the training set.
Related papers
- Patch-Based Diffusion Models Beat Whole-Image Models for Mismatched Distribution Inverse Problems [12.5216516851131]
We study out of distribution (OOD) problems where a known training distribution is first provided.
We use a patch-based diffusion prior that learns the image distribution solely from patches.
In both settings, the patch-based method can obtain high quality image reconstructions that can outperform whole-image models.
arXiv Detail & Related papers (2024-10-15T16:02:08Z) - Improved Distribution Matching Distillation for Fast Image Synthesis [54.72356560597428]
We introduce DMD2, a set of techniques that lift this limitation and improve DMD training.
First, we eliminate the regression loss and the need for expensive dataset construction.
Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images.
arXiv Detail & Related papers (2024-05-23T17:59:49Z) - Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data [74.2507346810066]
Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data.
We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data.
arXiv Detail & Related papers (2024-03-20T14:22:12Z) - Ambient Diffusion Posterior Sampling: Solving Inverse Problems with
Diffusion Models trained on Corrupted Data [56.81246107125692]
Ambient Diffusion Posterior Sampling (A-DPS) is a generative model pre-trained on one type of corruption.
We show that A-DPS can sometimes outperform models trained on clean data for several image restoration tasks in both speed and performance.
We extend the Ambient Diffusion framework to train MRI models with access only to Fourier subsampled multi-coil MRI measurements.
arXiv Detail & Related papers (2024-03-13T17:28:20Z) - Which Pretrain Samples to Rehearse when Finetuning Pretrained Models? [60.59376487151964]
Fine-tuning pretrained models on specific tasks is now the de facto approach for text and vision tasks.
A known pitfall of this approach is the forgetting of pretraining knowledge that happens during finetuning.
We propose a novel sampling scheme, mix-cd, that identifies and prioritizes samples that actually face forgetting.
arXiv Detail & Related papers (2024-02-12T22:32:12Z) - Masked Diffusion Models Are Fast Distribution Learners [32.485235866596064]
Diffusion models are commonly trained to learn all fine-grained visual information from scratch.
We show that it suffices to train a strong diffusion model by first pre-training the model to learn some primer distribution.
Then the pre-trained model can be fine-tuned for various generation tasks efficiently.
arXiv Detail & Related papers (2023-06-20T08:02:59Z) - GSURE-Based Diffusion Model Training with Corrupted Data [35.56267114494076]
We propose a novel training technique for generative diffusion models based only on corrupted data.
We demonstrate our technique on face images as well as Magnetic Resonance Imaging (MRI)
arXiv Detail & Related papers (2023-05-22T15:27:20Z) - Soft Diffusion: Score Matching for General Corruptions [84.26037497404195]
We propose a new objective called Soft Score Matching that provably learns the score function for any linear corruption process.
We show that our objective learns the gradient of the likelihood under suitable regularity conditions for the family of corruption processes.
Our method achieves state-of-the-art FID score $1.85$ on CelebA-64, outperforming all previous linear diffusion models.
arXiv Detail & Related papers (2022-09-12T17:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.