Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
- URL: http://arxiv.org/abs/2410.02416v1
- Date: Thu, 3 Oct 2024 12:06:29 GMT
- Title: Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
- Authors: Seyedmorteza Sadat, Otmar Hilliges, Romann M. Weber,
- Abstract summary: We revisit the CFG update rule and introduce modifications to address this issue.
We propose down-weighting the parallel component to achieve high-quality generations without oversaturation.
We also introduce a new rescaling momentum method for the CFG update rule based on this insight.
- Score: 27.640009920058187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifier-free guidance (CFG) is crucial for improving both generation quality and alignment between the input condition and final output in diffusion models. While a high guidance scale is generally required to enhance these aspects, it also causes oversaturation and unrealistic artifacts. In this paper, we revisit the CFG update rule and introduce modifications to address this issue. We first decompose the update term in CFG into parallel and orthogonal components with respect to the conditional model prediction and observe that the parallel component primarily causes oversaturation, while the orthogonal component enhances image quality. Accordingly, we propose down-weighting the parallel component to achieve high-quality generations without oversaturation. Additionally, we draw a connection between CFG and gradient ascent and introduce a new rescaling and momentum method for the CFG update rule based on this insight. Our approach, termed adaptive projected guidance (APG), retains the quality-boosting advantages of CFG while enabling the use of higher guidance scales without oversaturation. APG is easy to implement and introduces practically no additional computational overhead to the sampling process. Through extensive experiments, we demonstrate that APG is compatible with various conditional diffusion models and samplers, leading to improved FID, recall, and saturation scores while maintaining precision comparable to CFG, making our method a superior plug-and-play alternative to standard classifier-free guidance.
Related papers
- Rectified Diffusion Guidance for Conditional Generation [62.00207951161297]
We revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (i.e., the widely used summing-to-one version) brings about expectation shift of the generative distribution.
We propose ReCFG with a relaxation on the guidance coefficients such that denoising with ReCFG strictly aligns with the diffusion theory.
That way the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected.
arXiv Detail & Related papers (2024-10-24T13:41:32Z) - PACE: marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization [35.922096876707975]
PACE is a generalization of PArameter-efficient fine-tuning with Consistency rEgularization.
We show that PACE implicitly regularizes gradients for enhanced generalization, but also implicitly aligns the fine-tuned and pre-trained models to retain knowledge.
PACE outperforms existing PEFT methods in four visual adaptation tasks: VTAB-1k, FGVC, few-shot learning and domain adaptation.
arXiv Detail & Related papers (2024-09-25T17:56:00Z) - No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models [25.301443993960277]
We revisit the core principles of CFG and introduce a new method, independent condition guidance (ICG)
ICG provides the benefits of CFG without the need for any special training procedures.
Our approach streamlines the training process of conditional diffusion models and can also be applied during inference on any pre-trained conditional model.
arXiv Detail & Related papers (2024-07-02T22:04:00Z) - CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models [52.29804282879437]
CFG++ is a novel approach that tackles the offmanifold challenges inherent to traditional CFG.
It offers better inversion-to-image generation, invertibility, smaller guidance scales, reduced mode collapse, etc.
It can be easily integrated into high-order diffusion solvers and naturally extends to distilled diffusion models.
arXiv Detail & Related papers (2024-06-12T10:40:10Z) - Adaptive Guidance: Training-free Acceleration of Conditional Diffusion
Models [44.58960475893552]
"Adaptive Guidance" (AG) is an efficient variant of computation-Free Guidance (CFG)
AG preserves CFG's image quality while reducing by 25%.
" LinearAG" offers even cheaper inference at the cost of deviating from the baseline model.
arXiv Detail & Related papers (2023-12-19T17:08:48Z) - Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models [75.9543301303586]
Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data.
Fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks.
However, we argue that prior work has overlooked the inherent biases in foundation models.
arXiv Detail & Related papers (2023-10-12T08:01:11Z) - Stay on topic with Classifier-Free Guidance [57.28934343207042]
We show that CFG can be used broadly as an inference-time technique in pure language modeling.
We show that CFG improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks.
arXiv Detail & Related papers (2023-06-30T17:07:02Z) - End-to-End Diffusion Latent Optimization Improves Classifier Guidance [81.27364542975235]
Direct Optimization of Diffusion Latents (DOODL) is a novel guidance method.
It enables plug-and-play guidance by optimizing diffusion latents.
It outperforms one-step classifier guidance on computational and human evaluation metrics.
arXiv Detail & Related papers (2023-03-23T22:43:52Z) - Graph Federated Learning for CIoT Devices in Smart Home Applications [23.216140264163535]
We propose a novel Graph Signal Processing (GSP)-inspired aggregation rule based on graph filtering dubbed G-Fedfilt''
The proposed aggregator enables a structured flow of information based on the graph's topology.
It is capable of yielding up to $2.41%$ higher accuracy than FedAvg in the case of testing the generalization of the models.
arXiv Detail & Related papers (2022-12-29T17:57:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.