Related papers: Towards a Golden Classifier-Free Guidance Path via Foresight Fixed Point Iterations

Towards a Golden Classifier-Free Guidance Path via Foresight Fixed Point Iterations

URL: http://arxiv.org/abs/2510.21512v1
Date: Fri, 24 Oct 2025 14:39:07 GMT
Title: Towards a Golden Classifier-Free Guidance Path via Foresight Fixed Point Iterations
Authors: Kaibo Wang, Jianda Mao, Tong Wu, Yang Xiang,
Abstract summary: We propose a unified perspective that reframes conditional guidance as fixed point iterations.<n>We introduce Foresight Guidance (FSG), which prioritizes solving longer-interval subproblems in early diffusion stages.<n>Our work offers novel perspectives for conditional guidance and unlocks the potential of adaptive design.
Score: 12.366757123129402
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Classifier-Free Guidance (CFG) is an essential component of text-to-image diffusion models, and understanding and advancing its operational mechanisms remains a central focus of research. Existing approaches stem from divergent theoretical interpretations, thereby limiting the design space and obscuring key design choices. To address this, we propose a unified perspective that reframes conditional guidance as fixed point iterations, seeking to identify a golden path where latents produce consistent outputs under both conditional and unconditional generation. We demonstrate that CFG and its variants constitute a special case of single-step short-interval iteration, which is theoretically proven to exhibit inefficiency. To this end, we introduce Foresight Guidance (FSG), which prioritizes solving longer-interval subproblems in early diffusion stages with increased iterations. Extensive experiments across diverse datasets and model architectures validate the superiority of FSG over state-of-the-art methods in both image quality and computational efficiency. Our work offers novel perspectives for conditional guidance and unlocks the potential of adaptive design.

Related papers

Evolutionary Router Feature Generation for Zero-Shot Graph Anomaly Detection with Mixture-of-Experts [60.60414602796664]
We propose a novel MoE framework with evolutionary router feature generation (EvoFG) for zero-shot GAD.<n>EvoFG consistently outperforms state-of-the-art baselines, achieving strong and stable zero-shot GAD performance.
arXiv Detail & Related papers (2026-02-12T06:16:51Z)
Cross-Domain Transfer with Self-Supervised Spectral-Spatial Modeling for Hyperspectral Image Classification [5.784164305429653]
This paper proposes a self-supervised cross-domain transfer framework.<n>It learns transferable spectral-spatial joint representations without source labels.<n> Experimental results demonstrate stable classification performance and strong cross-domain adaptability.
arXiv Detail & Related papers (2026-01-26T02:52:35Z)
CogDoc: Towards Unified thinking in Documents [53.41571589733423]
We propose a unified coarse-to-fine thinking framework that mimics human cognitive processes: a low-resolution "Fast Reading" phase for scalable information localization, followed by a high-resolution "Focused Thinking" phase for deep reasoning.<n>We conduct a rigorous investigation into post-training strategies for the unified thinking framework, demonstrating that a Direct Reinforcement Learning approach outperforms RL with Supervised Fine-Tuning (SFT)<n>Specifically, we find that direct RL avoids the "policy conflict" observed in SFT.
arXiv Detail & Related papers (2025-12-14T12:14:17Z)
Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models [64.92045568376705]
Coherent Contextual Decoding (CCD) is a novel inference framework built upon two core innovations.<n>CCD employs a trajectory rectification mechanism that leverages historical context to enhance sequence coherence.<n>Instead of rigid allocations based on diffusion steps, we introduce an adaptive sampling strategy that dynamically adjusts the unmasking budget for each step.
arXiv Detail & Related papers (2025-11-26T09:49:48Z)
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling [53.61290359948953]
Tangential Amplifying Guidance (TAG) operates solely on trajectory signals without modifying the underlying diffusion model.<n>We formalize this guidance process by leveraging a first-order Taylor expansion.<n> TAG is a plug-and-play, architecture-agnostic module that improves diffusion sampling fidelity with minimal computational addition.
arXiv Detail & Related papers (2025-10-06T06:53:29Z)
ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification [51.07970070817353]
An ideal time series classification (TSC) should be able to capture invariant representations.<n>Current methods are largely unguided, lacking the semantic direction required to isolate truly universal features.<n>We propose an end-to-end Energy-Regularized Information for Shift-Robustness framework to enable guided and reliable feature disentanglement.
arXiv Detail & Related papers (2025-08-19T12:13:41Z)
G4Seg: Generation for Inexact Segmentation Refinement with Diffusion Models [38.44872934965588]
This paper considers the problem of utilizing a large-scale text-to-image model to tackle the Inexact diffusion (IS) task.<n>We exploit the pattern discrepancies between original images and mask-conditional generated images to facilitate a coarse-to-fine segmentation refinement.
arXiv Detail & Related papers (2025-06-02T11:05:28Z)
CCD: Continual Consistency Diffusion for Lifelong Generative Modeling [29.568682321463886]
Continual Diffusion Generation (CDG) is a structured pipeline that redefines how diffusion models are implemented under continual learning.<n>We propose the first theoretical foundation for CDG, grounded in a cross-task analysis of diffusion-specific generative dynamics.<n>We show that CCD achieves SOTA performance across various benchmarks, especially improving generative metrics in overlapping-task scenarios.
arXiv Detail & Related papers (2025-05-17T09:49:25Z)
REG: Rectified Gradient Guidance for Conditional Diffusion Models [16.275782069986253]
We propose rectified gradient guidance (REG) to boost the performance of existing guidance methods.<n>REG provides a better approximation to the optimal solution than prior guidance techniques.<n>In experiments on class-conditional ImageNet and text-to-image generation tasks, REG consistently improves FID and Inception/CLIP scores.
arXiv Detail & Related papers (2025-01-31T03:16:18Z)
Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL) This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features. In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z)
OED: Towards One-stage End-to-End Dynamic Scene Graph Generation [18.374354844446962]
Dynamic Scene Graph Generation (DSGG) focuses on identifying visual relationships within the spatial-temporal domain of videos. We propose a one-stage end-to-end framework, termed OED, which streamlines the DSGG pipeline. This framework reformulates the task as a set prediction problem and leverages pair-wise features to represent each subject-object pair within the scene graph.
arXiv Detail & Related papers (2024-05-27T08:18:41Z)
Unified Domain Adaptive Semantic Segmentation [105.05235403072021]
Unsupervised Adaptive Domain Semantic (UDA-SS) aims to transfer the supervision from a labeled source domain to an unlabeled target domain.<n>We propose a Quad-directional Mixup (QuadMix) method, characterized by tackling distinct point attributes and feature inconsistencies.<n>Our method outperforms the state-of-the-art works by large margins on four challenging UDA-SS benchmarks.
arXiv Detail & Related papers (2023-11-22T09:18:49Z)
Consistency Regularization for Deep Face Anti-Spoofing [69.70647782777051]
Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems. Motivated by this exciting observation, we conjecture that encouraging feature consistency of different views may be a promising way to boost FAS models. We enhance both Embedding-level and Prediction-level Consistency Regularization (EPCR) in FAS.
arXiv Detail & Related papers (2021-11-24T08:03:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.