Related papers: Realism Control One-step Diffusion for Real-World Image Super-Resolution

Realism Control One-step Diffusion for Real-World Image Super-Resolution

URL: http://arxiv.org/abs/2509.10122v1
Date: Fri, 12 Sep 2025 10:32:04 GMT
Title: Realism Control One-step Diffusion for Real-World Image Super-Resolution
Authors: Zongliang Wu, Siming Zheng, Peng-Tao Jiang, Xin Yuan,
Abstract summary: We propose a Realism Controlled One-step Diffusion (RCOD) framework for Real-ISR.<n>RCOD provides explicit control over fidelity-realism trade-offs during the noise prediction phase.<n>Our method achieves superior fidelity and perceptual quality while maintaining computational efficiency.
Score: 21.13930153613271
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-trained diffusion models have shown great potential in real-world image super-resolution (Real-ISR) tasks by enabling high-resolution reconstructions. While one-step diffusion (OSD) methods significantly improve efficiency compared to traditional multi-step approaches, they still have limitations in balancing fidelity and realism across diverse scenarios. Since the OSDs for SR are usually trained or distilled by a single timestep, they lack flexible control mechanisms to adaptively prioritize these competing objectives, which are inherently manageable in multi-step methods through adjusting sampling steps. To address this challenge, we propose a Realism Controlled One-step Diffusion (RCOD) framework for Real-ISR. RCOD provides a latent domain grouping strategy that enables explicit control over fidelity-realism trade-offs during the noise prediction phase with minimal training paradigm modifications and original training data. A degradation-aware sampling strategy is also introduced to align distillation regularization with the grouping strategy and enhance the controlling of trade-offs. Moreover, a visual prompt injection module is used to replace conventional text prompts with degradation-aware visual tokens, enhancing both restoration accuracy and semantic consistency. Our method achieves superior fidelity and perceptual quality while maintaining computational efficiency. Extensive experiments demonstrate that RCOD outperforms state-of-the-art OSD methods in both quantitative metrics and visual qualities, with flexible realism control capabilities in the inference stage. The code will be released.

Related papers

Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models [13.32858759983739]
Large Vision-Language Models (LVLMs) often suffer from object hallucination, generating text inconsistent with visual inputs.<n>Existing inference-time interventions to mitigate this issue present a challenging trade-off.<n>We present Residual-Update Directed DEcoding Regulation (RUDDER), a framework that steers LVLMs towards visually-grounded generation.
arXiv Detail & Related papers (2025-11-13T13:29:38Z)
Boosting Fidelity for Pre-Trained-Diffusion-Based Low-Light Image Enhancement via Condition Refinement [63.54516423266521]
Pre-Trained Diffusion-Based (PTDB) methods often sacrifice content fidelity to attain higher perceptual realism.<n>We propose a novel optimization strategy for conditioning in pre-trained diffusion models, enhancing fidelity while preserving realism and aesthetics.<n>Our approach is plug-and-play, seamlessly integrating into existing diffusion networks to provide more effective control.
arXiv Detail & Related papers (2025-10-20T02:40:06Z)
ScaleWeaver: Weaving Efficient Controllable T2I Generation with Multi-Scale Reference Attention [86.93601565563954]
ScaleWeaver is a framework designed to achieve high-fidelity, controllable generation upon advanced visual autoregressive( VAR) models.<n>The proposed Reference Attention module discards the unnecessary attention from image$rightarrow$condition, reducing computational cost.<n>Experiments show that ScaleWeaver delivers high-quality generation and precise control while attaining superior efficiency over diffusion-based methods.
arXiv Detail & Related papers (2025-10-16T17:00:59Z)
Efficient Conditional Generation on Scale-based Visual Autoregressive Models [26.81493253536486]
Efficient Control Model (ECM) is a plug-and-play framework featuring a lightweight control module that introduces control signals via a distributed architecture.<n> ECM refines conditional features using real-time generated tokens, and a shared feed-forward network (FFN) designed to maximize the utilization of its limited capacity.<n>Our method achieves high-fidelity and diverse control over image generation, surpassing existing baselines while significantly improving both training and inference efficiency.
arXiv Detail & Related papers (2025-10-07T06:27:03Z)
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z)
One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors [75.24313405671433]
Diffusion-based image super-resolution (SR) methods have achieved remarkable success by leveraging large pre-trained text-to-image diffusion models as priors. We introduce a novel one-step SR model, which significantly addresses the efficiency issue of diffusion-based SR methods. Unlike existing fine-tuning strategies, we designed a degradation-guided Low-Rank Adaptation (LoRA) module specifically for SR.
arXiv Detail & Related papers (2024-09-25T16:15:21Z)
AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement [37.274077278901494]
We propose the Attribute Guidance Diffusion framework (AGLLDiff) for effective real-world LIE. AGLLDiff shifts the paradigm and models the desired attributes, such as image exposure, structure and color of normal-light images. Our approach outperforms the current leading unsupervised LIE methods across benchmarks in terms of distortion-based and perceptual-based metrics.
arXiv Detail & Related papers (2024-07-20T15:17:48Z)
Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution. By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.