Diffusion Model with Perceptual Loss
- URL: http://arxiv.org/abs/2401.00110v5
- Date: Wed, 6 Mar 2024 20:13:53 GMT
- Title: Diffusion Model with Perceptual Loss
- Authors: Shanchuan Lin, Xiao Yang
- Abstract summary: Diffusion models trained with mean squared error loss tend to generate unrealistic samples.
We show that the effectiveness of classifier-free guidance partly originates from it being a form of implicit perceptual guidance.
We propose a novel self-perceptual objective that results in diffusion models capable of generating more realistic samples.
- Score: 4.67483805599143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models trained with mean squared error loss tend to generate
unrealistic samples. Current state-of-the-art models rely on classifier-free
guidance to improve sample quality, yet its surprising effectiveness is not
fully understood. In this paper, we show that the effectiveness of
classifier-free guidance partly originates from it being a form of implicit
perceptual guidance. As a result, we can directly incorporate perceptual loss
in diffusion training to improve sample quality. Since the score matching
objective used in diffusion training strongly resembles the denoising
autoencoder objective used in unsupervised training of perceptual networks, the
diffusion model itself is a perceptual network and can be used to generate
meaningful perceptual loss. We propose a novel self-perceptual objective that
results in diffusion models capable of generating more realistic samples. For
conditional generation, our method only improves sample quality without
entanglement with the conditional input and therefore does not sacrifice sample
diversity. Our method can also improve sample quality for unconditional
generation, which was not possible with classifier-free guidance before.
Related papers
- Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training [20.492630610281658]
Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution.
We introduce a new self-supervised training objective that differentiates the levels of noise added to a sample.
We show by diverse experiments that the proposed contrastive diffusion training is effective for both sequential and parallel settings.
arXiv Detail & Related papers (2024-07-12T03:03:50Z) - Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data [74.2507346810066]
Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data.
We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data.
arXiv Detail & Related papers (2024-03-20T14:22:12Z) - Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized
Control [54.132297393662654]
Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins.
While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images.
We present theoretical and empirical evidence that demonstrates our framework is capable of efficiently generating diverse samples with high genuine rewards.
arXiv Detail & Related papers (2024-02-23T08:54:42Z) - Data Attribution for Diffusion Models: Timestep-induced Bias in
Influence Estimation [58.20016784231991]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - Fair Sampling in Diffusion Models through Switching Mechanism [4.990206466948269]
We propose a fairness-aware sampling method called textitattribute switching mechanism for diffusion models.
We mathematically prove and experimentally demonstrate the effectiveness of the proposed method on two key aspects.
arXiv Detail & Related papers (2024-01-06T06:55:26Z) - Bridging the Gap: Addressing Discrepancies in Diffusion Model Training
for Classifier-Free Guidance [1.6804613362826175]
Diffusion models have emerged as a pivotal advancement in generative models.
In this paper we aim to underscore a discrepancy between conventional training methods and the desired conditional sampling behavior.
We introduce an updated loss function that better aligns training objectives with sampling behaviors.
arXiv Detail & Related papers (2023-11-02T02:03:12Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z) - StoRM: A Diffusion-based Stochastic Regeneration Model for Speech
Enhancement and Dereverberation [20.262426487434393]
We present a regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion.
We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples.
arXiv Detail & Related papers (2022-12-22T16:35:42Z) - Classifier-Free Diffusion Guidance [17.355749359987648]
guidance is a recently introduced method of trade off mode coverage and sample fidelity in conditional diffusion models.
We show that guidance can be indeed performed by a pure generative model without such a classifier.
We combine the resulting conditional and unconditional score estimates to attain a trade-off between sample quality and diversity.
arXiv Detail & Related papers (2022-07-26T01:42:07Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.