Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales
- URL: http://arxiv.org/abs/2506.19713v1
- Date: Tue, 24 Jun 2025 15:19:42 GMT
- Title: Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales
- Authors: Seyedmorteza Sadat, Tobias Vontobel, Farnood Salehi, Romann M. Weber,
- Abstract summary: Low and high frequencies have distinct impacts on generation quality.<n>Applying a uniform scale across all frequencies -- as is done in standard CFG -- leads to over and reduced diversity at high scales.<n>We propose frequency-decoupled guidance (FDG), an effective approach that decomposes CFG into low-saturation and high-frequency components.
- Score: 1.9474278832087901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifier-free guidance (CFG) has become an essential component of modern conditional diffusion models. Although highly effective in practice, the underlying mechanisms by which CFG enhances quality, detail, and prompt alignment are not fully understood. We present a novel perspective on CFG by analyzing its effects in the frequency domain, showing that low and high frequencies have distinct impacts on generation quality. Specifically, low-frequency guidance governs global structure and condition alignment, while high-frequency guidance mainly enhances visual fidelity. However, applying a uniform scale across all frequencies -- as is done in standard CFG -- leads to oversaturation and reduced diversity at high scales and degraded visual quality at low scales. Based on these insights, we propose frequency-decoupled guidance (FDG), an effective approach that decomposes CFG into low- and high-frequency components and applies separate guidance strengths to each component. FDG improves image quality at low guidance scales and avoids the drawbacks of high CFG scales by design. Through extensive experiments across multiple datasets and models, we demonstrate that FDG consistently enhances sample fidelity while preserving diversity, leading to improved FID and recall compared to CFG, establishing our method as a plug-and-play alternative to standard classifier-free guidance.
Related papers
- Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z) - Rethinking Oversaturation in Classifier-Free Guidance via Low Frequency [7.3604864243987365]
We introduce a new perspective based on low-frequency signals.<n>We identify the accumulation of redundant information in these signals as the key factor behind oversaturation and unrealistic artifacts.<n> Experimental results demonstrate that LF-CFG effectively alleviates oversaturation and unrealistic artifacts across various diffusion models.
arXiv Detail & Related papers (2025-06-26T16:34:00Z) - Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering [18.543769006014383]
Diffusion models often exhibit inconsistent sample quality due to variations inherent in their sampling trajectories.<n>We introduce CFG-Rejection, an efficient, plug-and-play strategy that filters low-quality samples at an early stage of the denoising process.<n>We validate the effectiveness of CFG-Rejection in image generation through extensive experiments.
arXiv Detail & Related papers (2025-05-29T11:08:24Z) - Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models [57.20761595019967]
We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement.<n>NAG restores effective negative guidance where CFG collapses while maintaining fidelity.<n>NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video)
arXiv Detail & Related papers (2025-05-27T13:30:46Z) - FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z) - PCE-GAN: A Generative Adversarial Network for Point Cloud Attribute Quality Enhancement based on Optimal Transport [56.56430888985025]
We propose a generative adversarial network for point cloud quality enhancement (PCE-GAN)<n>The generator consists of a local feature extraction (LFE) unit, a global spatial correlation (GSC) unit and a feature squeeze unit.<n>The discriminator computes the deviation between the probability distributions of the enhanced point cloud and the original point cloud, guiding the generator to achieve high quality reconstruction.
arXiv Detail & Related papers (2025-02-26T07:34:33Z) - Classifier-Free Guidance: From High-Dimensional Analysis to Generalized Guidance Forms [22.44946627454133]
We show that CFG accurately reproduces the target distribution in sufficiently high and infinite dimensions.<n>We show that there is a large family of guidances enjoying this property, in particular nonlinear CFG generalizations.<n>Our findings are validated with experiments on class-conditional and text-to-image generation using state-of-the-art diffusion and flow-matching models.
arXiv Detail & Related papers (2025-02-11T10:29:29Z) - Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models [27.640009920058187]
We revisit the CFG update rule and introduce modifications to address this issue.<n>We propose down-weighting the parallel component to achieve high-quality generations without oversaturation.<n>We also introduce a new rescaling momentum method for the CFG update rule based on this insight.
arXiv Detail & Related papers (2024-10-03T12:06:29Z) - Mitigating Low-Frequency Bias: Feature Recalibration and Frequency Attention Regularization for Adversarial Robustness [23.77988226456179]
adversarial training (AT) has emerged as a promising defense strategy.<n>AT-trained models exhibit a bias toward low-frequency features while neglecting high-frequency components.<n>We propose High-Frequency Feature Disentanglement and Recalibration (HFDR), a novel module that strategically separates and recalibrates frequency-specific features.
arXiv Detail & Related papers (2024-07-04T15:46:01Z) - CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models [52.29804282879437]
CFG++ is a novel approach that tackles the offmanifold challenges inherent to traditional CFG.
It offers better inversion-to-image generation, invertibility, smaller guidance scales, reduced mode collapse, etc.
It can be easily integrated into high-order diffusion solvers and naturally extends to distilled diffusion models.
arXiv Detail & Related papers (2024-06-12T10:40:10Z) - High-level Feature Guided Decoding for Semantic Segmentation [54.424062794490254]
We propose to use powerful pre-trained high-level features as guidance (HFG) for the upsampler to produce robust results.
Specifically, the high-level features from the backbone are used to train the class tokens, which are then reused by the upsampler for classification.
To push the upper limit of HFG, we introduce a context augmentation encoder (CAE) that can efficiently and effectively operate on the low-resolution high-level feature.
arXiv Detail & Related papers (2023-03-15T14:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.