Related papers: CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

URL: http://arxiv.org/abs/2406.08070v1
Date: Wed, 12 Jun 2024 10:40:10 GMT
Title: CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
Authors: Hyungjin Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, Jong Chul Ye,
Abstract summary: CFG++ is a novel approach that tackles the off-manifold challenges inherent in traditional CFG. It offers significant improvements, including better sample quality for text-to-image generation, invertibility, smaller guidance scales, reduced mode collapse, etc.
Score: 52.29804282879437
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG has notable drawbacks. For instance, DDIM with CFG lacks invertibility, complicating image editing; furthermore, high guidance scales, essential for high-quality outputs, frequently result in issues like mode collapse. Contrary to the widespread belief that these are inherent limitations of diffusion models, this paper reveals that the problems actually stem from the off-manifold phenomenon associated with CFG, rather than the diffusion models themselves. More specifically, inspired by the recent advancements of diffusion model-based inverse problem solvers (DIS), we reformulate text-guidance as an inverse problem with a text-conditioned score matching loss, and develop CFG++, a novel approach that tackles the off-manifold challenges inherent in traditional CFG. CFG++ features a surprisingly simple fix to CFG, yet it offers significant improvements, including better sample quality for text-to-image generation, invertibility, smaller guidance scales, reduced mode collapse, etc. Furthermore, CFG++ enables seamless interpolation between unconditional and conditional sampling at lower guidance scales, consistently outperforming traditional CFG at all scales. Experimental results confirm that our method significantly enhances performance in text-to-image generation, DDIM inversion, editing, and solving inverse problems, suggesting a wide-ranging impact and potential applications in various fields that utilize text guidance. Project Page: https://cfgpp-diffusion.github.io/.

Related papers

Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models [57.20761595019967]
We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement.<n>NAG restores effective negative guidance where CFG collapses while maintaining fidelity.<n>NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video)
arXiv Detail & Related papers (2025-05-27T13:30:46Z)
Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance [19.83064246586143]
CFG is a technique for improving conditional diffusion models by linearly combining the outputs of conditional and unconditional denoisers.<n>While CFG enhances visual quality and improves alignment with prompts, it often reduces sample diversity.<n>We propose a Gibbs-like sampling procedure to draw samples from the desired tilted distribution.
arXiv Detail & Related papers (2025-05-27T12:27:33Z)
DICE: Distilling Classifier-Free Guidance into Text Embeddings [39.79747569096888]
Text-to-image diffusion models are capable of generating high-quality images, but these images often fail to align closely with the given text prompts. We present DIstilling CFG by enhancing text Embeddings (DICE), a novel approach that removes the reliance on CFG in the generative process. DICE distills a CFG-based text-to-image diffusion model into a CFG-free version by refining text embeddings to replicate CFG-based directions.
arXiv Detail & Related papers (2025-02-06T02:39:45Z)
Nested Annealed Training Scheme for Generative Adversarial Networks [54.70743279423088]
This paper focuses on a rigorous mathematical theoretical framework: the composite-functional-gradient GAN (CFG) We reveal the theoretical connection between the CFG model and score-based models. We find that the training objective of the CFG discriminator is equivalent to finding an optimal D(x)
arXiv Detail & Related papers (2025-01-20T07:44:09Z)
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion [63.609399000712905]
Inference at a scaled resolution leads to repetitive patterns and structural distortions. We propose two simple modules that combine to solve these issues. Our method, coined Fam diffusion, can seamlessly integrate into any latent diffusion model and requires no additional training.
arXiv Detail & Related papers (2024-11-27T17:51:44Z)
Contrastive CFG: Improving CFG in Diffusion Models by Contrasting Positive and Negative Concepts [55.298031232672734]
As-Free Guidance (CFG) has proven effective in conditional diffusion model sampling for improved condition alignment. We present a novel method to enhance negative CFG guidance using contrastive loss.
arXiv Detail & Related papers (2024-11-26T03:29:27Z)
Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis [43.481539150288434]
This work introduces a new family of. factor graph Diffusion Models (FG-DMs) FG-DMs models the joint distribution of. images and conditioning variables, such as semantic, sketch,. deep or normal maps via a factor graph decomposition.
arXiv Detail & Related papers (2024-10-29T00:54:00Z)
Rectified Diffusion Guidance for Conditional Generation [62.00207951161297]
We revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (i.e., the widely used summing-to-one version) brings about expectation shift of the generative distribution. We propose ReCFG with a relaxation on the guidance coefficients such that denoising with ReCFG strictly aligns with the diffusion theory. That way the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected.
arXiv Detail & Related papers (2024-10-24T13:41:32Z)
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models [27.640009920058187]
We revisit the CFG update rule and introduce modifications to address this issue. We propose down-weighting the parallel component to achieve high-quality generations without oversaturation. We also introduce a new rescaling momentum method for the CFG update rule based on this insight.
arXiv Detail & Related papers (2024-10-03T12:06:29Z)
Debiasing Text-to-Image Diffusion Models [84.46750441518697]
Learning-based Text-to-Image (TTI) models have revolutionized the way visual content is generated in various domains. Recent research has shown that nonnegligible social bias exists in current state-of-the-art TTI systems.
arXiv Detail & Related papers (2024-02-22T14:33:23Z)
Controlling Text-to-Image Diffusion by Orthogonal Finetuning [74.21549380288631]
We introduce a principled finetuning method -- Orthogonal Finetuning (OFT) for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
arXiv Detail & Related papers (2023-06-12T17:59:23Z)
Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance. We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt [14.917290578644424]
Haze usually leads to deteriorated images with low contrast, color shift and structural distortion. We propose a novel two branch network that leverages 2D discrete wavelete transform (DWT), fast Fourier convolution (FFC) residual block and a pretrained ConvNeXt model. Our model is able to effectively explore global contextual information and produce images with better perceptual quality.
arXiv Detail & Related papers (2023-05-08T02:59:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.