Slight Corruption in Pre-training Data Makes Better Diffusion Models
- URL: http://arxiv.org/abs/2405.20494v1
- Date: Thu, 30 May 2024 21:35:48 GMT
- Title: Slight Corruption in Pre-training Data Makes Better Diffusion Models
- Authors: Hao Chen, Yujin Han, Diganta Misra, Xiang Li, Kai Hu, Difan Zou, Masashi Sugiyama, Jindong Wang, Bhiksha Raj,
- Abstract summary: Diffusion models (DMs) have shown remarkable capabilities in generating high-quality images, audios, and videos.
DMs benefit significantly from extensive pre-training on large-scale datasets.
However, pre-training datasets often contain corrupted pairs where conditions do not accurately describe the data.
This paper presents the first comprehensive study on the impact of such corruption in pre-training data of DMs.
- Score: 71.90034201302397
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models (DMs) have shown remarkable capabilities in generating realistic high-quality images, audios, and videos. They benefit significantly from extensive pre-training on large-scale datasets, including web-crawled data with paired data and conditions, such as image-text and image-class pairs. Despite rigorous filtering, these pre-training datasets often inevitably contain corrupted pairs where conditions do not accurately describe the data. This paper presents the first comprehensive study on the impact of such corruption in pre-training data of DMs. We synthetically corrupt ImageNet-1K and CC3M to pre-train and evaluate over 50 conditional DMs. Our empirical findings reveal that various types of slight corruption in pre-training can significantly enhance the quality, diversity, and fidelity of the generated images across different DMs, both during pre-training and downstream adaptation stages. Theoretically, we consider a Gaussian mixture model and prove that slight corruption in the condition leads to higher entropy and a reduced 2-Wasserstein distance to the ground truth of the data distribution generated by the corruptly trained DMs. Inspired by our analysis, we propose a simple method to improve the training of DMs on practical datasets by adding condition embedding perturbations (CEP). CEP significantly improves the performance of various DMs in both pre-training and downstream tasks. We hope that our study provides new insights into understanding the data and pre-training processes of DMs.
Related papers
- Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - DANCE: Dual-View Distribution Alignment for Dataset Condensation [39.08022095906364]
We propose a new DM-based method named Dual-view distribution AligNment for dataset CondEnsation (DANCE)
Specifically, from the inner-class view, we construct multiple "middle encoders" to perform pseudo long-term distribution alignment.
While from the inter-class view, we use the expert models to perform distribution calibration.
arXiv Detail & Related papers (2024-06-03T07:22:17Z) - Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks [26.387044804861937]
Few-shot fine-tuning of Diffusion Models (DMs) is a key advancement, significantly reducing training costs and enabling personalized AI applications.
During the training process, image fidelity initially improves, then unexpectedly deteriorates with the emergence of noisy patterns, only to recover later with severe overfitting.
We term the stage with generated noisy patterns as corruption stage. Experimental results demonstrate that our method significantly mitigates corruption, and improves the fidelity, quality and diversity of the generated images in both object-driven and subject-driven generation tasks.
arXiv Detail & Related papers (2024-05-30T10:47:48Z) - Effective and Robust Adversarial Training against Data and Label Corruptions [35.53386268796071]
Corruptions due to data perturbations and label noise are prevalent in the datasets from unreliable sources.
We develop an Effective and Robust Adversarial Training framework to simultaneously handle two types of corruption.
arXiv Detail & Related papers (2024-05-07T10:53:20Z) - Robust Diffusion Models for Adversarial Purification [28.313494459818497]
Diffusion models (DMs) based adversarial purification (AP) has shown to be the most powerful alternative to adversarial training (AT)
We propose a novel robust reverse process with adversarial guidance, which is independent of given pre-trained DMs.
This robust guidance can not only ensure to generate purified examples retaining more semantic content but also mitigate the accuracy-robustness trade-off of DMs.
arXiv Detail & Related papers (2024-03-24T08:34:08Z) - Ambient Diffusion Posterior Sampling: Solving Inverse Problems with
Diffusion Models trained on Corrupted Data [56.81246107125692]
Ambient Diffusion Posterior Sampling (A-DPS) is a generative model pre-trained on one type of corruption.
We show that A-DPS can sometimes outperform models trained on clean data for several image restoration tasks in both speed and performance.
We extend the Ambient Diffusion framework to train MRI models with access only to Fourier subsampled multi-coil MRI measurements.
arXiv Detail & Related papers (2024-03-13T17:28:20Z) - Intrinsic Image Diffusion for Indoor Single-view Material Estimation [55.276815106443976]
We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes.
Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps.
Our method produces significantly sharper, more consistent, and more detailed materials, outperforming state-of-the-art methods by $1.5dB$ on PSNR and by $45%$ better FID score on albedo prediction.
arXiv Detail & Related papers (2023-12-19T15:56:19Z) - On the Connection between Pre-training Data Diversity and Fine-tuning
Robustness [66.30369048726145]
We find that the primary factor influencing downstream effective robustness is data quantity.
We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources.
arXiv Detail & Related papers (2023-07-24T05:36:19Z) - A Pretrainer's Guide to Training Data: Measuring the Effects of Data
Age, Domain Coverage, Quality, & Toxicity [84.6421260559093]
This study is the largest set of experiments to validate, quantify, and expose undocumented intuitions about text pretraining.
Our findings indicate there does not exist a one-size-fits-all solution to filtering training data.
arXiv Detail & Related papers (2023-05-22T15:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.