Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks
- URL: http://arxiv.org/abs/2405.19931v1
- Date: Thu, 30 May 2024 10:47:48 GMT
- Title: Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks
- Authors: Xiaoyu Wu, Jiaru Zhang, Yang Hua, Bohan Lyu, Hao Wang, Tao Song, Haibing Guan,
- Abstract summary: Few-shot fine-tuning of Diffusion Models (DMs) is a key advancement, significantly reducing training costs and enabling personalized AI applications.
During the training process, image fidelity initially improves, then unexpectedly deteriorates with the emergence of noisy patterns, only to recover later with severe overfitting.
We term the stage with generated noisy patterns as corruption stage. Experimental results demonstrate that our method significantly mitigates corruption, and improves the fidelity, quality and diversity of the generated images in both object-driven and subject-driven generation tasks.
- Score: 26.387044804861937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot fine-tuning of Diffusion Models (DMs) is a key advancement, significantly reducing training costs and enabling personalized AI applications. However, we explore the training dynamics of DMs and observe an unanticipated phenomenon: during the training process, image fidelity initially improves, then unexpectedly deteriorates with the emergence of noisy patterns, only to recover later with severe overfitting. We term the stage with generated noisy patterns as corruption stage. To understand this corruption stage, we begin by theoretically modeling the one-shot fine-tuning scenario, and then extend this modeling to more general cases. Through this modeling, we identify the primary cause of this corruption stage: a narrowed learning distribution inherent in the nature of few-shot fine-tuning. To tackle this, we apply Bayesian Neural Networks (BNNs) on DMs with variational inference to implicitly broaden the learned distribution, and present that the learning target of the BNNs can be naturally regarded as an expectation of the diffusion loss and a further regularization with the pretrained DMs. This approach is highly compatible with current few-shot fine-tuning methods in DMs and does not introduce any extra inference costs. Experimental results demonstrate that our method significantly mitigates corruption, and improves the fidelity, quality and diversity of the generated images in both object-driven and subject-driven generation tasks.
Related papers
- Learning Diffusion Model from Noisy Measurement using Principled Expectation-Maximization Method [9.173055778539641]
We propose a principled expectation-maximization (EM) framework that iteratively learns diffusion models from noisy data with arbitrary corruption types.
Our framework employs a plug-and-play Monte Carlo method to accurately estimate clean images from noisy measurements, followed by training the diffusion model using the reconstructed images.
arXiv Detail & Related papers (2024-10-15T03:54:59Z) - MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - Rethinking and Defending Protective Perturbation in Personalized Diffusion Models [21.30373461975769]
We study the fine-tuning process of personalized diffusion models (PDMs) through the lens of shortcut learning.
PDMs are susceptible to minor adversarial perturbations, leading to significant degradation when fine-tuned on corrupted datasets.
We propose a systematic defense framework that includes data purification and contrastive decoupling learning.
arXiv Detail & Related papers (2024-06-27T07:14:14Z) - Slight Corruption in Pre-training Data Makes Better Diffusion Models [71.90034201302397]
Diffusion models (DMs) have shown remarkable capabilities in generating high-quality images, audios, and videos.
DMs benefit significantly from extensive pre-training on large-scale datasets.
However, pre-training datasets often contain corrupted pairs where conditions do not accurately describe the data.
This paper presents the first comprehensive study on the impact of such corruption in pre-training data of DMs.
arXiv Detail & Related papers (2024-05-30T21:35:48Z) - Robust Diffusion Models for Adversarial Purification [28.313494459818497]
Diffusion models (DMs) based adversarial purification (AP) has shown to be the most powerful alternative to adversarial training (AT)
We propose a novel robust reverse process with adversarial guidance, which is independent of given pre-trained DMs.
This robust guidance can not only ensure to generate purified examples retaining more semantic content but also mitigate the accuracy-robustness trade-off of DMs.
arXiv Detail & Related papers (2024-03-24T08:34:08Z) - Model Will Tell: Training Membership Inference for Diffusion Models [15.16244745642374]
Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model.
In this paper, we explore a novel perspective for the TMI task by leveraging the intrinsic generative priors within the diffusion model.
arXiv Detail & Related papers (2024-03-13T12:52:37Z) - One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion
Schedule Flaws and Enhancing Low-Frequency Controls [77.42510898755037]
One More Step (OMS) is a compact network that incorporates an additional simple yet effective step during inference.
OMS elevates image fidelity and harmonizes the dichotomy between training and inference, while preserving original model parameters.
Once trained, various pre-trained diffusion models with the same latent domain can share the same OMS module.
arXiv Detail & Related papers (2023-11-27T12:02:42Z) - Phasic Content Fusing Diffusion Model with Directional Distribution
Consistency for Few-Shot Model Adaption [73.98706049140098]
We propose a novel phasic content fusing few-shot diffusion model with directional distribution consistency loss.
Specifically, we design a phasic training strategy with phasic content fusion to help our model learn content and style information when t is large.
Finally, we propose a cross-domain structure guidance strategy that enhances structure consistency during domain adaptation.
arXiv Detail & Related papers (2023-09-07T14:14:11Z) - Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Medical Image Reconstruction [75.91471250967703]
We introduce a novel sampling framework called Steerable Conditional Diffusion.
This framework adapts the diffusion model, concurrently with image reconstruction, based solely on the information provided by the available measurement.
We achieve substantial enhancements in out-of-distribution performance across diverse imaging modalities.
arXiv Detail & Related papers (2023-08-28T08:47:06Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.