Related papers: Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

URL: http://arxiv.org/abs/2410.02416v1
Date: Thu, 3 Oct 2024 12:06:29 GMT
Title: Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
Authors: Seyedmorteza Sadat, Otmar Hilliges, Romann M. Weber,
Abstract summary: We revisit the CFG update rule and introduce modifications to address this issue. We propose down-weighting the parallel component to achieve high-quality generations without oversaturation. We also introduce a new rescaling momentum method for the CFG update rule based on this insight.
Score: 27.640009920058187
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Classifier-free guidance (CFG) is crucial for improving both generation quality and alignment between the input condition and final output in diffusion models. While a high guidance scale is generally required to enhance these aspects, it also causes oversaturation and unrealistic artifacts. In this paper, we revisit the CFG update rule and introduce modifications to address this issue. We first decompose the update term in CFG into parallel and orthogonal components with respect to the conditional model prediction and observe that the parallel component primarily causes oversaturation, while the orthogonal component enhances image quality. Accordingly, we propose down-weighting the parallel component to achieve high-quality generations without oversaturation. Additionally, we draw a connection between CFG and gradient ascent and introduce a new rescaling and momentum method for the CFG update rule based on this insight. Our approach, termed adaptive projected guidance (APG), retains the quality-boosting advantages of CFG while enabling the use of higher guidance scales without oversaturation. APG is easy to implement and introduces practically no additional computational overhead to the sampling process. Through extensive experiments, we demonstrate that APG is compatible with various conditional diffusion models and samplers, leading to improved FID, recall, and saturation scores while maintaining precision comparable to CFG, making our method a superior plug-and-play alternative to standard classifier-free guidance.

Related papers

Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales [1.9474278832087901]
Low and high frequencies have distinct impacts on generation quality.<n>Applying a uniform scale across all frequencies -- as is done in standard CFG -- leads to over and reduced diversity at high scales.<n>We propose frequency-decoupled guidance (FDG), an effective approach that decomposes CFG into low-saturation and high-frequency components.
arXiv Detail & Related papers (2025-06-24T15:19:42Z)
Token Perturbation Guidance for Diffusion Models [1.511194037740325]
Token Perturbation Guidance (TPG) is a novel method that applies matrices directly to intermediate token representations within the diffusion network.<n>TPG is training-free and agnostic to input conditions, making it readily applicable to both conditional and unconditional generation.
arXiv Detail & Related papers (2025-06-10T21:25:46Z)
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models [57.20761595019967]
We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement.<n>NAG restores effective negative guidance where CFG collapses while maintaining fidelity.<n>NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video)
arXiv Detail & Related papers (2025-05-27T13:30:46Z)
Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance [19.83064246586143]
CFG is a technique for improving conditional diffusion models by linearly combining the outputs of conditional and unconditional denoisers.<n>While CFG enhances visual quality and improves alignment with prompts, it often reduces sample diversity.<n>We propose a Gibbs-like sampling procedure to draw samples from the desired tilted distribution.
arXiv Detail & Related papers (2025-05-27T12:27:33Z)
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking [15.052244821404079]
We introduce Adaptive-Free Guidance (A-CFG), a novel method that tailors unconditional input by leveraging the model's predictive confidence.<n>A-CFG focuses on areas of ambiguity leading to more effective guidance.<n> Experiments on diverse language generation benchmarks show that A-CFG yields substantial improvements over standard CFG.
arXiv Detail & Related papers (2025-05-26T16:40:22Z)
Entropy Rectifying Guidance for Diffusion and Flow Models [27.673559391846524]
Entropy Rectifying Guidance (ERG) is a simple and effective guidance mechanism based on inference-time changes in the attention mechanism of state-of-the-art diffusion transformer architectures. ERG results in significant improvements in various generation tasks such as text-to-image, class-conditional and unconditional image generation.
arXiv Detail & Related papers (2025-04-18T10:15:33Z)
Efficient Distillation of Classifier-Free Guidance using Adapters [0.0]
adapter guidance distillation (AGD) is a novel approach that simulates CFG in a single forward pass. AGD keeps the base model frozen and only trains minimal additional parameters. We show that AGD achieves comparable or superior FID to CFG across multiple architectures.
arXiv Detail & Related papers (2025-03-10T12:55:08Z)
Adding Additional Control to One-Step Diffusion with Joint Distribution Matching [58.37264951734603]
JDM is a novel approach that minimizes the reverse KL divergence between image-condition joint distributions. By deriving a tractable upper bound, JDM decouples fidelity learning from condition learning. This asymmetric distillation scheme enables our one-step student to handle controls unknown to the teacher model.
arXiv Detail & Related papers (2025-03-09T15:06:50Z)
PCE-GAN: A Generative Adversarial Network for Point Cloud Attribute Quality Enhancement based on Optimal Transport [56.56430888985025]
We propose a generative adversarial network for point cloud quality enhancement (PCE-GAN) The generator consists of a local feature extraction (LFE) unit, a global spatial correlation (GSC) unit and a feature squeeze unit. The discriminator computes the deviation between the probability distributions of the enhanced point cloud and the original point cloud, guiding the generator to achieve high quality reconstruction.
arXiv Detail & Related papers (2025-02-26T07:34:33Z)
Classifier-free Guidance with Adaptive Scaling [7.179513844921256]
Free guidance (CFG) is an essential mechanism in text-driven diffusion models. In this paper, we present $beta$adaptive-CFG, which controls the impact of guidance during generation. Our model obtained better FID scores, maintaining the text-to-image CLIP similarity scores at a level similar to that of the reference CFG.
arXiv Detail & Related papers (2025-02-14T22:04:53Z)
Nested Annealed Training Scheme for Generative Adversarial Networks [54.70743279423088]
This paper focuses on a rigorous mathematical theoretical framework: the composite-functional-gradient GAN (CFG) We reveal the theoretical connection between the CFG model and score-based models. We find that the training objective of the CFG discriminator is equivalent to finding an optimal D(x)
arXiv Detail & Related papers (2025-01-20T07:44:09Z)
Contrastive CFG: Improving CFG in Diffusion Models by Contrasting Positive and Negative Concepts [55.298031232672734]
As-Free Guidance (CFG) has proven effective in conditional diffusion model sampling for improved condition alignment. We present a novel method to enhance negative CFG guidance using contrastive loss.
arXiv Detail & Related papers (2024-11-26T03:29:27Z)
Rectified Diffusion Guidance for Conditional Generation [62.00207951161297]
We revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (i.e., the widely used summing-to-one version) brings about expectation shift of the generative distribution. We propose ReCFG with a relaxation on the guidance coefficients such that denoising with ReCFG strictly aligns with the diffusion theory. That way the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected.
arXiv Detail & Related papers (2024-10-24T13:41:32Z)
PACE: marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization [35.922096876707975]
PACE is a generalization of PArameter-efficient fine-tuning with Consistency rEgularization. We show that PACE implicitly regularizes gradients for enhanced generalization, but also implicitly aligns the fine-tuned and pre-trained models to retain knowledge. PACE outperforms existing PEFT methods in four visual adaptation tasks: VTAB-1k, FGVC, few-shot learning and domain adaptation.
arXiv Detail & Related papers (2024-09-25T17:56:00Z)
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models [25.301443993960277]
We revisit the core principles of CFG and introduce a new method, independent condition guidance (ICG) ICG provides the benefits of CFG without the need for any special training procedures. Our approach streamlines the training process of conditional diffusion models and can also be applied during inference on any pre-trained conditional model.
arXiv Detail & Related papers (2024-07-02T22:04:00Z)
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models [52.29804282879437]
CFG++ is a novel approach that tackles the offmanifold challenges inherent to traditional CFG. It offers better inversion-to-image generation, invertibility, smaller guidance scales, reduced mode collapse, etc. It can be easily integrated into high-order diffusion solvers and naturally extends to distilled diffusion models.
arXiv Detail & Related papers (2024-06-12T10:40:10Z)
Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models [44.58960475893552]
"Adaptive Guidance" (AG) is an efficient variant of computation-Free Guidance (CFG) AG preserves CFG's image quality while reducing by 25%. " LinearAG" offers even cheaper inference at the cost of deviating from the baseline model.
arXiv Detail & Related papers (2023-12-19T17:08:48Z)
Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models [75.9543301303586]
Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data. Fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks. However, we argue that prior work has overlooked the inherent biases in foundation models.
arXiv Detail & Related papers (2023-10-12T08:01:11Z)
End-to-End Diffusion Latent Optimization Improves Classifier Guidance [81.27364542975235]
Direct Optimization of Diffusion Latents (DOODL) is a novel guidance method. It enables plug-and-play guidance by optimizing diffusion latents. It outperforms one-step classifier guidance on computational and human evaluation metrics.
arXiv Detail & Related papers (2023-03-23T22:43:52Z)
Graph Federated Learning for CIoT Devices in Smart Home Applications [23.216140264163535]
We propose a novel Graph Signal Processing (GSP)-inspired aggregation rule based on graph filtering dubbed G-Fedfilt'' The proposed aggregator enables a structured flow of information based on the graph's topology. It is capable of yielding up to $2.41%$ higher accuracy than FedAvg in the case of testing the generalization of the models.
arXiv Detail & Related papers (2022-12-29T17:57:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.