TurboVSR: Fantastic Video Upscalers and Where to Find Them
- URL: http://arxiv.org/abs/2506.23618v1
- Date: Mon, 30 Jun 2025 08:24:13 GMT
- Title: TurboVSR: Fantastic Video Upscalers and Where to Find Them
- Authors: Zhongdao Wang, Guodongfang Zhao, Jingjing Ren, Bailan Feng, Shifeng Zhang, Wenbo Li,
- Abstract summary: Diffusion-based generative models have demonstrated exceptional promise in the video super-resolution (VSR) task.<n>We present TurboVSR, an ultra-efficient diffusion-based video super-resolution model.<n>TurboVSR performs on par with state-of-the-art VSR methods, while being 100+ times faster, taking only 7 seconds to process a 2-second long 1080p video.
- Score: 33.83721799307721
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion-based generative models have demonstrated exceptional promise in the video super-resolution (VSR) task, achieving a substantial advancement in detail generation relative to prior methods. However, these approaches face significant computational efficiency challenges. For instance, current techniques may require tens of minutes to super-resolve a mere 2-second, 1080p video. In this paper, we present TurboVSR, an ultra-efficient diffusion-based video super-resolution model. Our core design comprises three key aspects: (1) We employ an autoencoder with a high compression ratio of 32$\times$32$\times$8 to reduce the number of tokens. (2) Highly compressed latents pose substantial challenges for training. We introduce factorized conditioning to mitigate the learning complexity: we first learn to super-resolve the initial frame; subsequently, we condition the super-resolution of the remaining frames on the high-resolution initial frame and the low-resolution subsequent frames. (3) We convert the pre-trained diffusion model to a shortcut model to enable fewer sampling steps, further accelerating inference. As a result, TurboVSR performs on par with state-of-the-art VSR methods, while being 100+ times faster, taking only 7 seconds to process a 2-second long 1080p video. TurboVSR also supports image resolution by considering image as a one-frame video. Our efficient design makes SR beyond 1080p possible, results on 4K (3648$\times$2048) image SR show surprising fine details.
Related papers
- FCA2: Frame Compression-Aware Autoencoder for Modular and Fast Compressed Video Super-Resolution [68.77813885751308]
State-of-the-art (SOTA) compressed video super-resolution (CVSR) models face persistent challenges, including prolonged inference time, complex training pipelines, and reliance on auxiliary information.<n>We propose an efficient and scalable solution inspired by the structural and statistical similarities between hyperspectral images (HSI) and video data.<n>Our approach introduces a compression-driven dimensionality reduction strategy that reduces computational complexity, accelerates inference, and enhances the extraction of temporal information across frames.
arXiv Detail & Related papers (2025-06-13T07:59:52Z) - DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution [43.83739935393097]
We propose DOVE, an efficient one-step diffusion model for real-world video super-resolution.<n>DOVE is obtained by fine-tuning a pretrained video diffusion model (*i.e.*, CogVideoX)<n>Experiments show that DOVE exhibits comparable or superior performance to multi-step diffusion-based VSR methods.
arXiv Detail & Related papers (2025-05-22T05:16:45Z) - Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis [50.77548592888096]
Demand for 2K video synthesis is rising with increasing consumer expectations for ultra-clear visuals.<n>Turbo2K is an efficient framework for generating detail-rich 2K videos.
arXiv Detail & Related papers (2025-04-20T03:30:59Z) - Magic 1-For-1: Generating One Minute Video Clips within One Minute [53.07214657235465]
We present Magic 1-For-1 (Magic141), an efficient video generation model with optimized memory consumption and inference latency.<n>By applying a test time sliding window, we are able to generate a minute-long video within one minute with significantly improved visual quality and motion dynamics.
arXiv Detail & Related papers (2025-02-11T16:58:15Z) - REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents [110.41795676048835]
One crucial obstacle for large-scale applications is the expensive training and inference cost.
In this paper, we argue that videos contain much more redundant information than images, thus can be encoded by very few motion latents.
We train Reducio-DiT in around 3.2K training hours in total and generate a 16-frame 1024*1024 video clip within 15.5 seconds on a single A100 GPU.
arXiv Detail & Related papers (2024-11-20T18:59:52Z) - RTSR: A Real-Time Super-Resolution Model for AV1 Compressed Content [10.569678424799616]
Super-resolution (SR) is a key technique for improving the visual quality of video content.
To support real-time playback, it is important to implement fast SR models while preserving reconstruction quality.
This paper proposes a low-complexity SR method, RTSR, designed to enhance the visual quality of compressed video content.
arXiv Detail & Related papers (2024-11-20T14:36:06Z) - Hierarchical Patch Diffusion Models for High-Resolution Video Generation [50.42746357450949]
We develop deep context fusion, which propagates context information from low-scale to high-scale patches in a hierarchical manner.
We also propose adaptive computation, which allocates more network capacity and computation towards coarse image details.
The resulting model sets a new state-of-the-art FVD score of 66.32 and Inception Score of 87.68 in class-conditional video generation.
arXiv Detail & Related papers (2024-06-12T01:12:53Z) - Learning Trajectory-Aware Transformer for Video Super-Resolution [50.49396123016185]
Video super-resolution aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts.
Existing approaches usually align and aggregate video frames from limited adjacent frames.
We propose a novel Transformer for Video Super-Resolution (TTVSR)
arXiv Detail & Related papers (2022-04-08T03:37:39Z) - AIM 2020 Challenge on Video Extreme Super-Resolution: Methods and
Results [96.74919503142014]
This paper reviews the video extreme super-resolution challenge associated with the AIM 2020 workshop at ECCV 2020.
Track 1 is set up to gauge the state-of-the-art for such a demanding task, where fidelity to the ground truth is measured by PSNR and SSIM.
Track 2 therefore aims at generating visually pleasing results, which are ranked according to human perception, evaluated by a user study.
arXiv Detail & Related papers (2020-09-14T09:36:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.