One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution
- URL: http://arxiv.org/abs/2511.17138v3
- Date: Thu, 27 Nov 2025 03:58:51 GMT
- Title: One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution
- Authors: Yushun Fang, Yuxiang Chen, Shibo Yin, Qiang Hu, Jiangchao Yao, Ya Zhang, Xiaoyun Zhang, Yanfeng Wang,
- Abstract summary: We present ODTSR, a one-step diffusion transformer that performs Real-ISR considering fidelity and controllability simultaneously.<n>ODTSR employs Fidelity-aware Adversarial Training (FAA) to enhance controllability and achieve one-step inference.<n>Experiments demonstrate that ODTSR achieves state-of-the-art (SOTA) performance on generic Real-ISR.
- Score: 54.485573602613535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in diffusion-based real-world image super-resolution (Real-ISR) have demonstrated remarkable perceptual quality, yet the balance between fidelity and controllability remains a problem: multi-step diffusion-based methods suffer from generative diversity and randomness, resulting in low fidelity, while one-step methods lose control flexibility due to fidelity-specific finetuning. In this paper, we present ODTSR, a one-step diffusion transformer based on Qwen-Image that performs Real-ISR considering fidelity and controllability simultaneously: a newly introduced visual stream receives low-quality images (LQ) with adjustable noise (Control Noise), and the original visual stream receives LQs with consistent noise (Prior Noise), forming the Noise-hybrid Visual Stream (NVS) design. ODTSR further employs Fidelity-aware Adversarial Training (FAA) to enhance controllability and achieve one-step inference. Extensive experiments demonstrate that ODTSR not only achieves state-of-the-art (SOTA) performance on generic Real-ISR, but also enables prompt controllability on challenging scenarios such as real-world scene text image super-resolution (STISR) of Chinese characters without training on specific datasets. Codes are available at https://github.com/RedMediaTech/ODTSR.
Related papers
- Realism Control One-step Diffusion for Real-World Image Super-Resolution [21.13930153613271]
We propose a Realism Controlled One-step Diffusion (RCOD) framework for Real-ISR.<n>RCOD provides explicit control over fidelity-realism trade-offs during the noise prediction phase.<n>Our method achieves superior fidelity and perceptual quality while maintaining computational efficiency.
arXiv Detail & Related papers (2025-09-12T10:32:04Z) - One-Step Diffusion-based Real-World Image Super-Resolution with Visual Perception Distillation [53.24542646616045]
We propose VPD-SR, a novel visual perception diffusion distillation framework specifically designed for image super-resolution (SR) generation.<n>VPD-SR consists of two components: Explicit Semantic-aware Supervision (ESS) and High-frequency Perception (HFP) loss.<n>The proposed VPD-SR achieves superior performance compared to both previous state-of-the-art methods and the teacher model with just one-step sampling.
arXiv Detail & Related papers (2025-06-03T08:28:13Z) - SING: Semantic Image Communications using Null-Space and INN-Guided Diffusion Models [52.40011613324083]
Joint source-channel coding systems (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission.<n>Existing methods focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality.<n>We propose SING, a novel framework that formulates the recovery of high-quality images from corrupted reconstructions as an inverse problem.
arXiv Detail & Related papers (2025-03-16T12:32:11Z) - One-Step Effective Diffusion Network for Real-World Image Super-Resolution [11.326598938246558]
We propose a one-step effective diffusion network, namely OSEDiff, for the Real-ISR problem.
We finetune the pre-trained diffusion network with trainable layers to adapt it to complex image degradations.
Our OSEDiff model can efficiently and effectively generate HQ images in just one diffusion step.
arXiv Detail & Related papers (2024-06-12T13:10:31Z) - Diffusion-Aided Joint Source Channel Coding For High Realism Wireless Image Transmission [24.372996233209854]
DiffJSCC is a novel framework that produces high-realism images via the conditional diffusion denoising process.<n>It can achieve highly realistic reconstructions for 768x512 pixel Kodak images with only 3072 symbols.
arXiv Detail & Related papers (2024-04-27T00:12:13Z) - ICF-SRSR: Invertible scale-Conditional Function for Self-Supervised
Real-world Single Image Super-Resolution [60.90817228730133]
Single image super-resolution (SISR) is a challenging problem that aims to up-sample a given low-resolution (LR) image to a high-resolution (HR) counterpart.
Recent approaches are trained on simulated LR images degraded by simplified down-sampling operators.
We propose a novel Invertible scale-Conditional Function (ICF) which can scale an input image and then restore the original input with different scale conditions.
arXiv Detail & Related papers (2023-07-24T12:42:45Z) - Blind Super-Resolution for Remote Sensing Images via Conditional
Stochastic Normalizing Flows [14.882417028542855]
We propose a novel blind SR framework based on the normalizing flow (BlindSRSNF) to address the above problems.
BlindSRSNF learns the conditional probability distribution over the high-resolution image space given a low-resolution (LR) image by explicitly optimizing the variational bound on the likelihood.
We show that the proposed algorithm can obtain SR results with excellent visual perception quality on both simulated LR and real-world RSIs.
arXiv Detail & Related papers (2022-10-14T12:37:32Z) - CVF-SID: Cyclic multi-Variate Function for Self-Supervised Image
Denoising by Disentangling Noise from Image [53.76319163746699]
We propose a novel and powerful self-supervised denoising method called CVF-SID.
CVF-SID can disentangle a clean image and noise maps from the input by leveraging various self-supervised loss terms.
It achieves state-of-the-art self-supervised image denoising performance and is comparable to other existing approaches.
arXiv Detail & Related papers (2022-03-24T11:59:28Z) - Frequency Consistent Adaptation for Real World Super Resolution [64.91914552787668]
We propose a novel Frequency Consistent Adaptation (FCA) that ensures the frequency domain consistency when applying Super-Resolution (SR) methods to the real scene.
We estimate degradation kernels from unsupervised images and generate the corresponding Low-Resolution (LR) images.
Based on the domain-consistent LR-HR pairs, we train easy-implemented Convolutional Neural Network (CNN) SR models.
arXiv Detail & Related papers (2020-12-18T08:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.