Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution
- URL: http://arxiv.org/abs/2512.23532v1
- Date: Mon, 29 Dec 2025 15:09:20 GMT
- Title: Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution
- Authors: Hexin Zhang, Dong Li, Jie Huang, Bingzhou Wang, Xueyang Fu, Zhengjun Zha,
- Abstract summary: We propose Iterative Diffusion Inference-Time Scaling with Adaptive Frequency Steering (IAFS)<n>IAFS addresses the challenge of balancing perceptual quality and structural fidelity by progressively refining the generated image through iterative correction of structural deviations.<n>Experiments show that IAFS effectively resolves the perception-fidelity conflict, yielding consistently improved perceptual detail and structural accuracy, and outperforming existing inference-time scaling methods.
- Score: 75.3690742776891
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have become a leading paradigm for image super-resolution (SR), but existing methods struggle to guarantee both the high-frequency perceptual quality and the low-frequency structural fidelity of generated images. Although inference-time scaling can theoretically improve this trade-off by allocating more computation, existing strategies remain suboptimal: reward-driven particle optimization often causes perceptual over-smoothing, while optimal-path search tends to lose structural consistency. To overcome these difficulties, we propose Iterative Diffusion Inference-Time Scaling with Adaptive Frequency Steering (IAFS), a training-free framework that jointly leverages iterative refinement and frequency-aware particle fusion. IAFS addresses the challenge of balancing perceptual quality and structural fidelity by progressively refining the generated image through iterative correction of structural deviations. Simultaneously, it ensures effective frequency fusion by adaptively integrating high-frequency perceptual cues with low-frequency structural information, allowing for a more accurate and balanced reconstruction across different image details. Extensive experiments across multiple diffusion-based SR models show that IAFS effectively resolves the perception-fidelity conflict, yielding consistently improved perceptual detail and structural accuracy, and outperforming existing inference-time scaling methods.
Related papers
- FiDeSR: High-Fidelity and Detail-Preserving One-Step Diffusion Super-Resolution [11.03986460753769]
We propose FiDeSR, a high-fidelity and detail-preserving one-step diffusion super-resolution framework.<n>During training, we introduce a detail-aware weighting strategy that adaptively emphasizes regions where the model exhibits higher prediction errors.<n>During inference, low- and high-frequency adaptive enhancers further refine the reconstruction without requiring model retraining.
arXiv Detail & Related papers (2026-03-03T07:34:49Z) - Spectral Regularization for Diffusion Models [14.919876123456747]
We propose a loss-level spectral regularization framework that augments standard diffusion training with differentiable Fourier- and wavelet-domain losses.<n>Our approach is compatible with DDPM, DDIM, and EDM formulations and introduces negligible computational overhead.
arXiv Detail & Related papers (2026-03-02T22:39:02Z) - Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers [55.15722080205737]
Edit2Perceive is a unified diffusion framework that adapts editing models for depth, normal, and matting.<n>Our single-step deterministic inference yields up to faster runtime while training on relatively small datasets.
arXiv Detail & Related papers (2025-11-24T01:13:51Z) - Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement [89.99237142387655]
We introduce LH-VAE, which enhances semantic robustness through visual semantic constraints and progressive degradations.<n>Latent Harmony is a two-stage framework that redefines VAEs for UHD restoration by jointly regularizing the latent space and enforcing high-frequency-aware reconstruction.<n>Experiments show Latent Harmony achieves state-of-the-art performance across UHD and standard-resolution tasks, effectively balancing efficiency, perceptual quality, and reconstruction accuracy.
arXiv Detail & Related papers (2025-10-09T08:54:26Z) - Edge-Aware Normalized Attention for Efficient and Detail-Preserving Single Image Super-Resolution [27.3322913419539]
Single-image super-resolution (SISR) remains highly ill-posed because recovering structurally faithful high-frequency content from a single low-resolution observation is ambiguous.<n>Existing edge-aware methods often attach edge priors or attention branches onto increasingly complex backbones, yet ad hoc fusion frequently introduces redundancy, unstable optimization, or limited structural gains.<n>We address this gap with an edge-guided attention mechanism that derives an adaptive modulation map from jointly encoded edge features and intermediate feature activations, then applies it to normalize and reweight responses, selectively amplifying structurally salient regions while suppressing spurious textures.
arXiv Detail & Related papers (2025-09-18T02:31:24Z) - Generative imaging for radio interferometry with fast uncertainty quantification [4.294714866547824]
Learnable reconstruction methods have shown promise in providing efficient and high quality reconstruction.<n>In this article we explore the use of generative neural networks that enable efficient approximate sampling of the posterior distribution.<n>Our methods provide a significant step toward computationally efficient, scalable, and uncertainty-aware imaging for next-generation radio telescopes.
arXiv Detail & Related papers (2025-07-28T18:52:07Z) - One-Step Diffusion-based Real-World Image Super-Resolution with Visual Perception Distillation [53.24542646616045]
We propose VPD-SR, a novel visual perception diffusion distillation framework specifically designed for image super-resolution (SR) generation.<n>VPD-SR consists of two components: Explicit Semantic-aware Supervision (ESS) and High-frequency Perception (HFP) loss.<n>The proposed VPD-SR achieves superior performance compared to both previous state-of-the-art methods and the teacher model with just one-step sampling.
arXiv Detail & Related papers (2025-06-03T08:28:13Z) - FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion [63.609399000712905]
Inference at a scaled resolution leads to repetitive patterns and structural distortions.<n>We propose two simple modules that combine to solve these issues.<n>Our method, coined Fam diffusion, can seamlessly integrate into any latent diffusion model and requires no additional training.
arXiv Detail & Related papers (2024-11-27T17:51:44Z) - DiffFNO: Diffusion Fourier Neural Operator [8.895165270489167]
We introduce DiffFNO, a novel diffusion framework for arbitrary-scale super-resolution strengthened by a Weighted Fourier Neural Operator (WFNO)<n>Mode Rebalancing in WFNO effectively captures critical frequency components, significantly improving the reconstruction of high-frequency image details.<n>Our approach sets a new standard in super-resolution, delivering both superior accuracy and computational efficiency.
arXiv Detail & Related papers (2024-11-15T03:14:11Z) - FreqINR: Frequency Consistency for Implicit Neural Representation with Adaptive DCT Frequency Loss [5.349799154834945]
This paper introduces Frequency Consistency for Implicit Neural Representation (FreqINR), an innovative Arbitrary-scale Super-resolution method.
During training, we employ Adaptive Discrete Cosine Transform Frequency Loss (ADFL) to minimize the frequency gap between HR and ground-truth images.
During inference, we extend the receptive field to preserve spectral coherence between low-resolution (LR) and ground-truth images.
arXiv Detail & Related papers (2024-08-25T03:53:17Z) - Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation [5.234109158596138]
We propose a new training framework for SAR-to-optical image translation.
Our method employs consistency distillation to reduce iterative inference steps and integrates adversarial learning to ensure image clarity and minimize color shifts.
The results demonstrate that our approach significantly improves inference speed by 131 times while maintaining the visual quality of the generated images.
arXiv Detail & Related papers (2024-07-08T16:36:12Z) - Frequency Consistent Adaptation for Real World Super Resolution [64.91914552787668]
We propose a novel Frequency Consistent Adaptation (FCA) that ensures the frequency domain consistency when applying Super-Resolution (SR) methods to the real scene.
We estimate degradation kernels from unsupervised images and generate the corresponding Low-Resolution (LR) images.
Based on the domain-consistent LR-HR pairs, we train easy-implemented Convolutional Neural Network (CNN) SR models.
arXiv Detail & Related papers (2020-12-18T08:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.