Related papers: Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

URL: http://arxiv.org/abs/2505.18600v2
Date: Tue, 27 May 2025 16:02:29 GMT
Title: Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment
Authors: Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye,
Abstract summary: Chain-of-zoom (CoZ) is a framework that factorizes SISR into a chain of intermediate scale-states with multi-scale-aware prompts.<n>Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a vision-language model (VLM)<n>Experiments show that a standard 4x diffusion SR model wrapped in CoZ attains beyond 256x enlargement with high perceptual quality and fidelity.
Score: 51.99765487172328
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but collapse when asked to magnify far beyond that regime. We address this scalability bottleneck with Chain-of-Zoom (CoZ), a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts. CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training. Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a vision-language model (VLM). The prompt extractor itself is fine-tuned using Generalized Reward Policy Optimization (GRPO) with a critic VLM, aligning text guidance towards human preference. Experiments show that a standard 4x diffusion SR model wrapped in CoZ attains beyond 256x enlargement with high perceptual quality and fidelity. Project Page: https://bryanswkim.github.io/chain-of-zoom/ .

Related papers

Stroke-based Cyclic Amplifier: Image Super-Resolution at Arbitrary Ultra-Large Scales [10.209274379479586]
Prior Arbitrary-Scale Image Super-Resolution (ASISR) methods often experience a significant performance decline when the upsampling factor exceeds the range covered by the training data.<n>We propose a unified model, Stroke-based Cyclic Amplifier (SbCA), for ultra-large upsampling tasks.
arXiv Detail & Related papers (2025-06-12T14:51:10Z)
Multi-scale Image Super Resolution with a Single Auto-Regressive Model [40.77470215283583]
We tackle Image Super Resolution (ISR) using recent advances in Visual Auto-Regressive ( VAR) modeling.<n>To the best of our knowledge, this is the first time a quantizer is trained to force semantically consistent residuals at different scales.<n>Our model can denoise the LR image and super-resolve at half and full target upscale factors in a single forward pass.
arXiv Detail & Related papers (2025-06-05T13:02:23Z)
Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling [50.34513854725803]
Arbitrary-scale super-resolution (ASSR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs with arbitrary upsampling factors.<n>We propose a novel ContinuousSR framework with a Pixel-to-Gaussian paradigm, which explicitly reconstructs 2D continuous HR signals from LR images using Gaussian Splatting.
arXiv Detail & Related papers (2025-03-09T13:43:57Z)
Visual Autoregressive Modeling for Image Super-Resolution [14.935662351654601]
We propose a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction.<n>We collect large-scale data and design a training process to obtain robust generative priors.
arXiv Detail & Related papers (2025-01-31T09:53:47Z)
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution [95.98801201266099]
Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps.<n>We propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR.<n>Our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR.
arXiv Detail & Related papers (2024-11-26T04:49:42Z)
$\text{S}^{3}$Mamba: Arbitrary-Scale Super-Resolution via Scaleable State Space Model [45.65903826290642]
ASSR aims to super-resolve low-resolution images to high-resolution images at any scale using a single model. We propose a novel arbitrary-scale super-resolution method, called $textS3$Mamba, to construct a scalable continuous representation space.
arXiv Detail & Related papers (2024-11-16T11:13:02Z)
ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction [27.21399221644529]
NeRF-based methods reconstruct 3D scenes by building a radiance field with implicit or explicit representations. We propose Arbitrary-Scale Super-Resolution NeRF (ASSR-NeRF), a novel framework for super-resolution novel view synthesis (SRNVS)
arXiv Detail & Related papers (2024-06-28T17:22:33Z)
Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR. Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z)
Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations [61.448005005426666]
We consider two challenging issues in reference-based super-resolution (RefSR) for smartphone. We propose a novel self-supervised learning approach for real-world RefSR from observations at dual and multiple camera zooms.
arXiv Detail & Related papers (2024-05-03T15:20:30Z)
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution [14.935662351654601]
Diffusion-based methods, endowed with a formidable generative prior, have received increasing attention in Image Super-Resolution. It is challenging for ISR models to perceive the semantic and degradation information, resulting in restoration images with incorrect content or unrealistic artifacts. We propose a textitCross-modal Priors for Super-Resolution (XPSR) framework to acquire precise and comprehensive semantic conditions for the diffusion model.
arXiv Detail & Related papers (2024-03-08T04:52:22Z)
DDet: Dual-path Dynamic Enhancement Network for Real-World Image Super-Resolution [69.2432352477966]
Real image super-resolution(Real-SR) focus on the relationship between real-world high-resolution(HR) and low-resolution(LR) image. In this article, we propose a Dual-path Dynamic Enhancement Network(DDet) for Real-SR. Unlike conventional methods which stack up massive convolutional blocks for feature representation, we introduce a content-aware framework to study non-inherently aligned image pair.
arXiv Detail & Related papers (2020-02-25T18:24:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.