CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
- URL: http://arxiv.org/abs/2406.08070v1
- Date: Wed, 12 Jun 2024 10:40:10 GMT
- Title: CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
- Authors: Hyungjin Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, Jong Chul Ye,
- Abstract summary: CFG++ is a novel approach that tackles the off-manifold challenges inherent in traditional CFG.
It offers significant improvements, including better sample quality for text-to-image generation, invertibility, smaller guidance scales, reduced mode collapse, etc.
- Score: 52.29804282879437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG has notable drawbacks. For instance, DDIM with CFG lacks invertibility, complicating image editing; furthermore, high guidance scales, essential for high-quality outputs, frequently result in issues like mode collapse. Contrary to the widespread belief that these are inherent limitations of diffusion models, this paper reveals that the problems actually stem from the off-manifold phenomenon associated with CFG, rather than the diffusion models themselves. More specifically, inspired by the recent advancements of diffusion model-based inverse problem solvers (DIS), we reformulate text-guidance as an inverse problem with a text-conditioned score matching loss, and develop CFG++, a novel approach that tackles the off-manifold challenges inherent in traditional CFG. CFG++ features a surprisingly simple fix to CFG, yet it offers significant improvements, including better sample quality for text-to-image generation, invertibility, smaller guidance scales, reduced mode collapse, etc. Furthermore, CFG++ enables seamless interpolation between unconditional and conditional sampling at lower guidance scales, consistently outperforming traditional CFG at all scales. Experimental results confirm that our method significantly enhances performance in text-to-image generation, DDIM inversion, editing, and solving inverse problems, suggesting a wide-ranging impact and potential applications in various fields that utilize text guidance. Project Page: https://cfgpp-diffusion.github.io/.
Related papers
- Debiasing Text-to-Image Diffusion Models [84.46750441518697]
Learning-based Text-to-Image (TTI) models have revolutionized the way visual content is generated in various domains.
Recent research has shown that nonnegligible social bias exists in current state-of-the-art TTI systems.
arXiv Detail & Related papers (2024-02-22T14:33:23Z) - Text Diffusion with Reinforced Conditioning [92.17397504834825]
This paper thoroughly analyzes text diffusion models and uncovers two significant limitations: degradation of self-conditioning during training and misalignment between training and sampling.
Motivated by our findings, we propose a novel Text Diffusion model called TREC, which mitigates the degradation with Reinforced Conditioning and the misalignment by Time-Aware Variance Scaling.
arXiv Detail & Related papers (2024-02-19T09:24:02Z) - A Contrastive Variational Graph Auto-Encoder for Node Clustering [10.52321770126932]
State-of-the-art clustering methods have numerous challenges.
Existing VGAEs do not account for the discrepancy between the inference and generative models.
Our solution has two mechanisms to control the trade-off between Feature Randomness and Feature Drift.
arXiv Detail & Related papers (2023-12-28T05:07:57Z) - Adaptive Guidance: Training-free Acceleration of Conditional Diffusion
Models [44.58960475893552]
"Adaptive Guidance" (AG) is an efficient variant of computation-Free Guidance (CFG)
AG preserves CFG's image quality while reducing by 25%.
" LinearAG" offers even cheaper inference at the cost of deviating from the baseline model.
arXiv Detail & Related papers (2023-12-19T17:08:48Z) - Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion
Models [58.46926334842161]
This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps.
We propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores.
Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability.
arXiv Detail & Related papers (2023-12-10T22:07:42Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Controlling Text-to-Image Diffusion by Orthogonal Finetuning [74.21549380288631]
We introduce a principled finetuning method -- Orthogonal Finetuning (OFT) for adapting text-to-image diffusion models to downstream tasks.
Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere.
We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
arXiv Detail & Related papers (2023-06-12T17:59:23Z) - Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method
based on Fast Fourier Convolution and ConvNeXt [14.917290578644424]
Haze usually leads to deteriorated images with low contrast, color shift and structural distortion.
We propose a novel two branch network that leverages 2D discrete wavelete transform (DWT), fast Fourier convolution (FFC) residual block and a pretrained ConvNeXt model.
Our model is able to effectively explore global contextual information and produce images with better perceptual quality.
arXiv Detail & Related papers (2023-05-08T02:59:02Z) - Panini-Net: GAN Prior Based Degradation-Aware Feature Interpolation for
Face Restoration [4.244692655670362]
Panini-Net is a degradation-aware feature network for face restoration.
It learns the abstract representations to distinguish various degradations.
It achieves state-of-the-art performance for multi-degradation face restoration and face super-resolution.
arXiv Detail & Related papers (2022-03-16T07:41:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.