Entropy Rectifying Guidance for Diffusion and Flow Models
- URL: http://arxiv.org/abs/2504.13987v1
- Date: Fri, 18 Apr 2025 10:15:33 GMT
- Title: Entropy Rectifying Guidance for Diffusion and Flow Models
- Authors: Tariq Berrada Ifriqi, Adriana Romero-Soriano, Michal Drozdzal, Jakob Verbeek, Karteek Alahari,
- Abstract summary: Entropy Rectifying Guidance (ERG) is a simple and effective guidance mechanism based on inference-time changes in the attention mechanism of state-of-the-art diffusion transformer architectures.<n>ERG results in significant improvements in various generation tasks such as text-to-image, class-conditional and unconditional image generation.
- Score: 27.673559391846524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Guidance techniques are commonly used in diffusion and flow models to improve image quality and consistency for conditional generative tasks such as class-conditional and text-to-image generation. In particular, classifier-free guidance (CFG) -- the most widely adopted guidance technique -- contrasts conditional and unconditional predictions to improve the generated images. This results, however, in trade-offs across quality, diversity and consistency, improving some at the expense of others. While recent work has shown that it is possible to disentangle these factors to some extent, such methods come with an overhead of requiring an additional (weaker) model, or require more forward passes per sampling step. In this paper, we propose Entropy Rectifying Guidance (ERG), a simple and effective guidance mechanism based on inference-time changes in the attention mechanism of state-of-the-art diffusion transformer architectures, which allows for simultaneous improvements over image quality, diversity and prompt consistency. ERG is more general than CFG and similar guidance techniques, as it extends to unconditional sampling. ERG results in significant improvements in various generation tasks such as text-to-image, class-conditional and unconditional image generation. We also show that ERG can be seamlessly combined with other recent guidance methods such as CADS and APG, further boosting generation performance.
Related papers
- Guidance Free Image Editing via Explicit Conditioning [8.81828807024982]
Explicit Conditioning (EC) of the noise distribution on the input modalities to achieve this.<n>We present evaluations on image editing tasks and demonstrate that EC outperforms CFG in generating diverse high-quality images with significantly reduced computations.
arXiv Detail & Related papers (2025-03-22T00:44:23Z) - A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models [27.640009920058187]
We revisit the CFG update rule and introduce modifications to address this issue.
We propose down-weighting the parallel component to achieve high-quality generations without oversaturation.
We also introduce a new rescaling momentum method for the CFG update rule based on this insight.
arXiv Detail & Related papers (2024-10-03T12:06:29Z) - CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models [52.29804282879437]
CFG++ is a novel approach that tackles the offmanifold challenges inherent to traditional CFG.
It offers better inversion-to-image generation, invertibility, smaller guidance scales, reduced mode collapse, etc.
It can be easily integrated into high-order diffusion solvers and naturally extends to distilled diffusion models.
arXiv Detail & Related papers (2024-06-12T10:40:10Z) - Real-World Image Variation by Aligning Diffusion Inversion Chain [53.772004619296794]
A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images.
We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL)
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
arXiv Detail & Related papers (2023-05-30T04:09:47Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z) - LT-GAN: Self-Supervised GAN with Latent Transformation Detection [10.405721171353195]
We propose a self-supervised approach (LT-GAN) to improve the generation quality and diversity of images.
We experimentally demonstrate that our proposed LT-GAN can be effectively combined with other state-of-the-art training techniques for added benefits.
arXiv Detail & Related papers (2020-10-19T22:09:45Z) - Image Augmentations for GAN Training [57.65145659417266]
We provide insights and guidelines on how to augment images for both vanilla GANs and GANs with regularizations.
Surprisingly, we find that vanilla GANs attain generation quality on par with recent state-of-the-art results.
arXiv Detail & Related papers (2020-06-04T00:16:02Z) - Group Equivariant Generative Adversarial Networks [7.734726150561089]
In this work, we explicitly incorporate inductive symmetry priors into the network architectures via group-equivariant convolutional networks.
Group-convariants have higher expressive power with fewer samples and lead to better gradient feedback between generator and discriminator.
arXiv Detail & Related papers (2020-05-04T17:38:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.