Compress Guidance in Conditional Diffusion Sampling
- URL: http://arxiv.org/abs/2408.11194v2
- Date: Tue, 22 Oct 2024 00:42:31 GMT
- Title: Compress Guidance in Conditional Diffusion Sampling
- Authors: Anh-Dung Dinh, Daochang Liu, Chang Xu,
- Abstract summary: This work identifies and quantifies the problem, demonstrating that reducing or excluding guidance at numerous timesteps can mitigate this issue.
We observe a significant improvement in image quality and diversity while also reducing the required guidance timesteps by nearly 40%.
- Score: 16.671575782090045
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We found that enforcing guidance throughout the sampling process is often counterproductive due to the model-fitting issue, where samples are 'tuned' to match the classifier's parameters rather than generalizing the expected condition. This work identifies and quantifies the problem, demonstrating that reducing or excluding guidance at numerous timesteps can mitigate this issue. By distributing a small amount of guidance over a large number of sampling timesteps, we observe a significant improvement in image quality and diversity while also reducing the required guidance timesteps by nearly 40%. This approach addresses a major challenge in applying guidance effectively to generative tasks. Consequently, our proposed method, termed Compress Guidance, allows for the exclusion of a substantial number of guidance timesteps while still surpassing baseline models in image quality. We validate our approach through benchmarks on label-conditional and text-to-image generative tasks across various datasets and models.
Related papers
- Distributional Diffusion Models with Scoring Rules [83.38210785728994]
Diffusion models generate high-quality synthetic data.
generating high-quality outputs requires many discretization steps.
We propose to accomplish sample generation by learning the posterior em distribution of clean data samples.
arXiv Detail & Related papers (2025-02-04T16:59:03Z) - Few-shot Online Anomaly Detection and Segmentation [29.693357653538474]
This paper focuses on addressing the challenging yet practical few-shot online anomaly detection and segmentation (FOADS) task.
Under the FOADS framework, models are trained on a few-shot normal dataset, followed by inspection and improvement of their capabilities by leveraging unlabeled streaming data containing both normal and abnormal samples simultaneously.
In order to achieve improved performance with limited training samples, we employ multi-scale feature embedding extracted from a CNN pre-trained on ImageNet to obtain a robust representation.
arXiv Detail & Related papers (2024-03-27T02:24:00Z) - Mitigating Exposure Bias in Discriminator Guided Diffusion Models [4.5349436061325425]
We propose SEDM-G++, which incorporates a modified sampling approach, combining Discriminator Guidance and Epsilon Scaling.
Our proposed approach outperforms the current state-of-the-art, by achieving an FID score of 1.73 on the unconditional CIFAR-10 dataset.
arXiv Detail & Related papers (2023-11-18T20:49:50Z) - Semi-Supervised Learning for hyperspectral images by non parametrically
predicting view assignment [25.198550162904713]
Hyperspectral image (HSI) classification is gaining a lot of momentum in present time because of high inherent spectral information within the images.
Recently, to effectively train the deep learning models with minimal labelled samples, the unlabeled samples are also being leveraged in self-supervised and semi-supervised setting.
In this work, we leverage the idea of semi-supervised learning to assist the discriminative self-supervised pretraining of the models.
arXiv Detail & Related papers (2023-06-19T14:13:56Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models [48.77653835765705]
We introduce a probabilistic resolution to prompt tuning, where the label-specific prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model.
We evaluate the effectiveness of our approach on four tasks: few-shot image recognition, base-to-new generalization, dataset transfer learning, and domain shifts.
arXiv Detail & Related papers (2023-03-16T06:09:15Z) - Latent Autoregressive Source Separation [5.871054749661012]
This paper introduces vector-quantized Latent Autoregressive Source Separation (i.e., de-mixing an input signal into its constituent sources) without requiring additional gradient-based optimization or modifications of existing models.
Our separation method relies on the Bayesian formulation in which the autoregressive models are the priors, and a discrete (non-parametric) likelihood function is constructed by performing frequency counts over latent sums of addend tokens.
arXiv Detail & Related papers (2023-01-09T17:32:00Z) - Challenges in leveraging GANs for few-shot data augmentation [16.679224813570734]
We explore the use of GAN-based few-shot data augmentation as a method to improve few-shot classification performance.
We identify issues related to the difficulty of training such generative models under a purely supervised regime.
We propose a semi-supervised fine-tuning approach as a more pragmatic way forward to address these problems.
arXiv Detail & Related papers (2022-03-30T20:36:49Z) - Rethinking Sampling Strategies for Unsupervised Person Re-identification [59.47536050785886]
We analyze the reasons for the performance differences between various sampling strategies under the same framework and loss function.
Group sampling is proposed, which gathers samples from the same class into groups.
Experiments on Market-1501, DukeMTMC-reID and MSMT17 show that group sampling achieves performance comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-07-07T05:39:58Z) - Anytime Sampling for Autoregressive Models via Ordered Autoencoding [88.01906682843618]
Autoregressive models are widely used for tasks such as image and audio generation.
The sampling process of these models does not allow interruptions and cannot adapt to real-time computational resources.
We propose a new family of autoregressive models that enables anytime sampling.
arXiv Detail & Related papers (2021-02-23T05:13:16Z) - Effective Distant Supervision for Temporal Relation Extraction [49.20329405920023]
A principal barrier to training temporal relation extraction models in new domains is the lack of varied, high quality examples.
We present a method of automatically collecting distantly-supervised examples of temporal relations.
arXiv Detail & Related papers (2020-10-24T03:17:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.