Bridging Fidelity-Reality with Controllable One-Step Diffusion for Image Super-Resolution
- URL: http://arxiv.org/abs/2512.14061v1
- Date: Tue, 16 Dec 2025 03:56:02 GMT
- Title: Bridging Fidelity-Reality with Controllable One-Step Diffusion for Image Super-Resolution
- Authors: Hao Chen, Junyang Chen, Jinshan Pan, Jiangxin Dong,
- Abstract summary: CODSR is a controllable one-step diffusion network for image super-resolution.<n>We propose an LQ-guided feature modulation module to provide high-fidelity conditioning for the diffusion process.<n>We develop a region-adaptive generative prior activation method to effectively enhance perceptual richness.
- Score: 59.71803719801537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent diffusion-based one-step methods have shown remarkable progress in the field of image super-resolution, yet they remain constrained by three critical limitations: (1) inferior fidelity performance caused by the information loss from compression encoding of low-quality (LQ) inputs; (2) insufficient region-discriminative activation of generative priors; (3) misalignment between text prompts and their corresponding semantic regions. To address these limitations, we propose CODSR, a controllable one-step diffusion network for image super-resolution. First, we propose an LQ-guided feature modulation module that leverages original uncompressed information from LQ inputs to provide high-fidelity conditioning for the diffusion process. We then develop a region-adaptive generative prior activation method to effectively enhance perceptual richness without sacrificing local structural fidelity. Finally, we employ a text-matching guidance strategy to fully harness the conditioning potential of text prompts. Extensive experiments demonstrate that CODSR achieves superior perceptual quality and competitive fidelity compared with state-of-the-art methods with efficient one-step inference.
Related papers
- Enhancing Text-to-Image Generation via End-Edge Collaborative Hybrid Super-Resolution [6.015475364527398]
We propose an end-edge collaborative generation-enhancement framework.<n> Experiments show that our system reduces service latency by 33% compared with baselines.
arXiv Detail & Related papers (2026-01-21T07:55:37Z) - Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models [64.92045568376705]
Coherent Contextual Decoding (CCD) is a novel inference framework built upon two core innovations.<n>CCD employs a trajectory rectification mechanism that leverages historical context to enhance sequence coherence.<n>Instead of rigid allocations based on diffusion steps, we introduce an adaptive sampling strategy that dynamically adjusts the unmasking budget for each step.
arXiv Detail & Related papers (2025-11-26T09:49:48Z) - SRSR: Enhancing Semantic Accuracy in Real-World Image Super-Resolution with Spatially Re-Focused Text-Conditioning [59.013863248600046]
We propose a spatially re-focused super-resolution framework that refines text conditioning at inference time.<n>Second, we introduce a Spatially Targeted-Free Guidance mechanism that selectively bypasses text influences on ungrounded pixels to prevent hallucinations.
arXiv Detail & Related papers (2025-10-26T05:03:55Z) - Boosting Fidelity for Pre-Trained-Diffusion-Based Low-Light Image Enhancement via Condition Refinement [63.54516423266521]
Pre-Trained Diffusion-Based (PTDB) methods often sacrifice content fidelity to attain higher perceptual realism.<n>We propose a novel optimization strategy for conditioning in pre-trained diffusion models, enhancing fidelity while preserving realism and aesthetics.<n>Our approach is plug-and-play, seamlessly integrating into existing diffusion networks to provide more effective control.
arXiv Detail & Related papers (2025-10-20T02:40:06Z) - Single-Step Latent Consistency Model for Remote Sensing Image Super-Resolution [7.920423405957888]
We propose a novel single-step diffusion approach designed to enhance both efficiency and visual quality in RSISR tasks.<n>The proposed LCMSR reduces the iterative steps of traditional diffusion models from 50-1000 or more to just a single step.<n> Experimental results demonstrate that LCMSR effectively balances efficiency and performance, achieving inference times comparable to non-diffusion models.
arXiv Detail & Related papers (2025-03-25T09:56:21Z) - Improving Consistency in Diffusion Models for Image Super-Resolution [28.945663118445037]
We observe two kinds of inconsistencies in diffusion-based methods.<n>We introduce ConsisSR to handle both semantic and training-inference consistencies.<n>Our method demonstrates state-of-the-art performance among existing diffusion models.
arXiv Detail & Related papers (2024-10-17T17:41:52Z) - Text-guided Explorable Image Super-resolution [14.83045604603449]
We propose two approaches for zero-shot text-guided super-resolution.
We show that the proposed approaches result in diverse solutions that match the semantic meaning provided by the text prompt.
arXiv Detail & Related papers (2024-03-02T08:10:54Z) - One-stage Low-resolution Text Recognition with High-resolution Knowledge
Transfer [53.02254290682613]
Current solutions for low-resolution text recognition typically rely on a two-stage pipeline.
We propose an efficient and effective knowledge distillation framework to achieve multi-level knowledge transfer.
Experiments show that the proposed one-stage pipeline significantly outperforms super-resolution based two-stage frameworks.
arXiv Detail & Related papers (2023-08-05T02:33:45Z) - Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution.
By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z) - Improving Scene Text Image Super-resolution via Dual Prior Modulation
Network [20.687100711699788]
Scene text image super-resolution (STISR) aims to simultaneously increase the resolution and legibility of the text images.
Existing approaches neglect the global structure of the text, which bounds the semantic determinism of the scene text.
Our work proposes a plug-and-play module dubbed Dual Prior Modulation Network (DPMN), which leverages dual image-level priors to bring performance gain over existing approaches.
arXiv Detail & Related papers (2023-02-21T02:59:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.