Angle Domain Guidance: Latent Diffusion Requires Rotation Rather Than Extrapolation
- URL: http://arxiv.org/abs/2506.11039v1
- Date: Wed, 21 May 2025 03:36:56 GMT
- Title: Angle Domain Guidance: Latent Diffusion Requires Rotation Rather Than Extrapolation
- Authors: Cheng Jin, Zhenyu Xiao, Chutao Liu, Yuantao Gu,
- Abstract summary: Under high guidance weights, where text-image alignment is significantly enhanced, classifier-free guidance (CFG) leads to pronounced color distortions in the generated images.<n>We present a theoretical framework that elucidates the mechanisms of norm amplification and anomalous diffusion phenomena induced by CFG.<n>We propose an Angle Domain Guidance (ADG) algorithm to mitigate color distortions while preserving enhanced text-image alignment.
- Score: 21.209660996306184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifier-free guidance (CFG) has emerged as a pivotal advancement in text-to-image latent diffusion models, establishing itself as a cornerstone technique for achieving high-quality image synthesis. However, under high guidance weights, where text-image alignment is significantly enhanced, CFG also leads to pronounced color distortions in the generated images. We identify that these distortions stem from the amplification of sample norms in the latent space. We present a theoretical framework that elucidates the mechanisms of norm amplification and anomalous diffusion phenomena induced by classifier-free guidance. Leveraging our theoretical insights and the latent space structure, we propose an Angle Domain Guidance (ADG) algorithm. ADG constrains magnitude variations while optimizing angular alignment, thereby mitigating color distortions while preserving the enhanced text-image alignment achieved at higher guidance weights. Experimental results demonstrate that ADG significantly outperforms existing methods, generating images that not only maintain superior text alignment but also exhibit improved color fidelity and better alignment with human perceptual preferences.
Related papers
- Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z) - Entropy Rectifying Guidance for Diffusion and Flow Models [27.673559391846524]
Entropy Rectifying Guidance (ERG) is a simple and effective guidance mechanism based on inference-time changes in the attention mechanism of state-of-the-art diffusion transformer architectures.<n>ERG results in significant improvements in various generation tasks such as text-to-image, class-conditional and unconditional image generation.
arXiv Detail & Related papers (2025-04-18T10:15:33Z) - Learning to Harmonize Cross-vendor X-ray Images by Non-linear Image Dynamics Correction [13.836238771024254]
We show that the nonlinear characteristics of domain-specific image dynamics cannot be addressed by simple linear transforms.<n>We propose a method termed Global Deep Curve Estimation to reduce domain-specific mismatch exposure.
arXiv Detail & Related papers (2025-04-14T10:24:57Z) - Classifier-Free Guidance: From High-Dimensional Analysis to Generalized Guidance Forms [22.44946627454133]
We show that CFG accurately reproduces the target distribution in sufficiently high and infinite dimensions.<n>We show that there is a large family of guidances enjoying this property, in particular nonlinear CFG generalizations.<n>Our findings are validated with experiments on class-conditional and text-to-image generation using state-of-the-art diffusion and flow-matching models.
arXiv Detail & Related papers (2025-02-11T10:29:29Z) - Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention [0.7770029179741429]
Conditional diffusion models have shown remarkable success in visual content generation.
Recent attempts to extend unconditional guidance have relied on techniques, resulting in suboptimal generation quality.
We propose Smoothed Energy Guidance (SEG), a novel training- and condition-free approach to enhance image generation.
arXiv Detail & Related papers (2024-08-01T17:59:09Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based
Single Image Super-resolution [88.13972071356422]
We propose a diffusion-style data augmentation scheme for GAN-based image super-resolution (SR) methods, known as DifAugGAN.
It involves adapting the diffusion process in generative diffusion models for improving the calibration of the discriminator during training.
Our DifAugGAN can be a Plug-and-Play strategy for current GAN-based SISR methods to improve the calibration of the discriminator and thus improve SR performance.
arXiv Detail & Related papers (2023-11-30T12:37:53Z) - Global Structure-Aware Diffusion Process for Low-Light Image Enhancement [64.69154776202694]
This paper studies a diffusion-based framework to address the low-light image enhancement problem.
We advocate for the regularization of its inherent ODE-trajectory.
Experimental evaluations reveal that the proposed framework attains distinguished performance in low-light enhancement.
arXiv Detail & Related papers (2023-10-26T17:01:52Z) - Controlling Text-to-Image Diffusion by Orthogonal Finetuning [74.21549380288631]
We introduce a principled finetuning method -- Orthogonal Finetuning (OFT) for adapting text-to-image diffusion models to downstream tasks.
Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere.
We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
arXiv Detail & Related papers (2023-06-12T17:59:23Z) - Hierarchical Semantic Regularization of Latent Spaces in StyleGANs [53.98170188547775]
We propose a Hierarchical Semantic Regularizer (HSR) which aligns the hierarchical representations learnt by the generator to corresponding powerful features learnt by pretrained networks on large amounts of data.
HSR is shown to not only improve generator representations but also the linearity and smoothness of the latent style spaces, leading to the generation of more natural-looking style-edited images.
arXiv Detail & Related papers (2022-08-07T16:23:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.