Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment
- URL: http://arxiv.org/abs/2505.18600v2
- Date: Tue, 27 May 2025 16:02:29 GMT
- Title: Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment
- Authors: Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye,
- Abstract summary: Chain-of-zoom (CoZ) is a framework that factorizes SISR into a chain of intermediate scale-states with multi-scale-aware prompts.<n>Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a vision-language model (VLM)<n>Experiments show that a standard 4x diffusion SR model wrapped in CoZ attains beyond 256x enlargement with high perceptual quality and fidelity.
- Score: 51.99765487172328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but collapse when asked to magnify far beyond that regime. We address this scalability bottleneck with Chain-of-Zoom (CoZ), a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts. CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training. Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a vision-language model (VLM). The prompt extractor itself is fine-tuned using Generalized Reward Policy Optimization (GRPO) with a critic VLM, aligning text guidance towards human preference. Experiments show that a standard 4x diffusion SR model wrapped in CoZ attains beyond 256x enlargement with high perceptual quality and fidelity. Project Page: https://bryanswkim.github.io/chain-of-zoom/ .
Related papers
- Stroke-based Cyclic Amplifier: Image Super-Resolution at Arbitrary Ultra-Large Scales [10.209274379479586]
Prior Arbitrary-Scale Image Super-Resolution (ASISR) methods often experience a significant performance decline when the upsampling factor exceeds the range covered by the training data.<n>We propose a unified model, Stroke-based Cyclic Amplifier (SbCA), for ultra-large upsampling tasks.
arXiv Detail & Related papers (2025-06-12T14:51:10Z) - Multi-scale Image Super Resolution with a Single Auto-Regressive Model [40.77470215283583]
We tackle Image Super Resolution (ISR) using recent advances in Visual Auto-Regressive ( VAR) modeling.<n>To the best of our knowledge, this is the first time a quantizer is trained to force semantically consistent residuals at different scales.<n>Our model can denoise the LR image and super-resolve at half and full target upscale factors in a single forward pass.
arXiv Detail & Related papers (2025-06-05T13:02:23Z) - Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling [50.34513854725803]
Arbitrary-scale super-resolution (ASSR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs with arbitrary upsampling factors.<n>We propose a novel ContinuousSR framework with a Pixel-to-Gaussian paradigm, which explicitly reconstructs 2D continuous HR signals from LR images using Gaussian Splatting.
arXiv Detail & Related papers (2025-03-09T13:43:57Z) - Visual Autoregressive Modeling for Image Super-Resolution [14.935662351654601]
We propose a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction.<n>We collect large-scale data and design a training process to obtain robust generative priors.
arXiv Detail & Related papers (2025-01-31T09:53:47Z) - PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution [95.98801201266099]
Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps.<n>We propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR.<n>Our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR.
arXiv Detail & Related papers (2024-11-26T04:49:42Z) - $\text{S}^{3}$Mamba: Arbitrary-Scale Super-Resolution via Scaleable State Space Model [45.65903826290642]
ASSR aims to super-resolve low-resolution images to high-resolution images at any scale using a single model.
We propose a novel arbitrary-scale super-resolution method, called $textS3$Mamba, to construct a scalable continuous representation space.
arXiv Detail & Related papers (2024-11-16T11:13:02Z) - ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction [27.21399221644529]
NeRF-based methods reconstruct 3D scenes by building a radiance field with implicit or explicit representations.
We propose Arbitrary-Scale Super-Resolution NeRF (ASSR-NeRF), a novel framework for super-resolution novel view synthesis (SRNVS)
arXiv Detail & Related papers (2024-06-28T17:22:33Z) - Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z) - Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations [61.448005005426666]
We consider two challenging issues in reference-based super-resolution (RefSR) for smartphone.
We propose a novel self-supervised learning approach for real-world RefSR from observations at dual and multiple camera zooms.
arXiv Detail & Related papers (2024-05-03T15:20:30Z) - XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution [14.935662351654601]
Diffusion-based methods, endowed with a formidable generative prior, have received increasing attention in Image Super-Resolution.
It is challenging for ISR models to perceive the semantic and degradation information, resulting in restoration images with incorrect content or unrealistic artifacts.
We propose a textitCross-modal Priors for Super-Resolution (XPSR) framework to acquire precise and comprehensive semantic conditions for the diffusion model.
arXiv Detail & Related papers (2024-03-08T04:52:22Z) - DDet: Dual-path Dynamic Enhancement Network for Real-World Image
Super-Resolution [69.2432352477966]
Real image super-resolution(Real-SR) focus on the relationship between real-world high-resolution(HR) and low-resolution(LR) image.
In this article, we propose a Dual-path Dynamic Enhancement Network(DDet) for Real-SR.
Unlike conventional methods which stack up massive convolutional blocks for feature representation, we introduce a content-aware framework to study non-inherently aligned image pair.
arXiv Detail & Related papers (2020-02-25T18:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.