Related papers: FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

URL: http://arxiv.org/abs/2510.12747v1
Date: Tue, 14 Oct 2025 17:25:54 GMT
Title: FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution
Authors: Junhao Zhuang, Shi Guo, Xin Cai, Xiaohui Li, Yihao Liu, Chun Yuan, Tianfan Xue,
Abstract summary: FlashVSR is the first diffusion-based one-step streaming framework towards real-time VSR.<n>It runs at approximately 17 FPS for 768x1408 videos on a single A100 GPU.<n>It scales reliably to ultra-high resolutions and achieves state-of-the-art performance with up to 12x speedup over prior one-step diffusion VSR models.
Score: 61.284842030283464
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models have recently advanced video restoration, but applying them to real-world video super-resolution (VSR) remains challenging due to high latency, prohibitive computation, and poor generalization to ultra-high resolutions. Our goal in this work is to make diffusion-based VSR practical by achieving efficiency, scalability, and real-time performance. To this end, we propose FlashVSR, the first diffusion-based one-step streaming framework towards real-time VSR. FlashVSR runs at approximately 17 FPS for 768x1408 videos on a single A100 GPU by combining three complementary innovations: (i) a train-friendly three-stage distillation pipeline that enables streaming super-resolution, (ii) locality-constrained sparse attention that cuts redundant computation while bridging the train-test resolution gap, and (iii) a tiny conditional decoder that accelerates reconstruction without sacrificing quality. To support large-scale training, we also construct VSR-120K, a new dataset with 120k videos and 180k images. Extensive experiments show that FlashVSR scales reliably to ultra-high resolutions and achieves state-of-the-art performance with up to 12x speedup over prior one-step diffusion VSR models. We will release the code, pretrained models, and dataset to foster future research in efficient diffusion-based VSR.

Related papers

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion [10.847237180991948]
Stream-DiffVSR is a causally conditioned diffusion framework for efficient online VSR.<n>It processes 720p frames in 0.328 seconds on an GTX4090 GPU.<n>It boosts perceptual quality (LPIPS +0.095) while reducing latency by over 130x.
arXiv Detail & Related papers (2025-12-29T18:59:57Z)
InfVSR: Breaking Length Limits of Generic Video Super-Resolution [40.30527504651693]
InfVSR is an autoregressive-one-step-diffusion paradigm for long sequences.<n>We distill the diffusion process into a single step efficiently, with patch-wise pixel supervision and cross-chunk distribution matching.<n>Our method pushes the frontier of long-form VSR, achieves state-of-the-art quality with enhanced semantic consistency, and delivers up to 58x speed-up over existing methods.
arXiv Detail & Related papers (2025-10-01T14:21:45Z)
Asymmetric VAE for One-Step Video Super-Resolution Acceleration [63.419142632861345]
We propose FastVSR, which achieves substantial reductions in computational cost by implementing a high compression VAE.<n>FastVSR achieves speedups of 111.9 times compared to multi-step models and 3.92 times compared to existing one-step models.
arXiv Detail & Related papers (2025-09-29T00:36:14Z)
OS-DiffVSR: Towards One-step Latent Diffusion Model for High-detailed Real-world Video Super-Resolution [11.859297492802456]
We propose One-Step Diffusion model for real-world Video Super-Resolution, namely OS-DiffVSR.<n>Specifically, we devise a novel adjacent frame adversarial training paradigm, which can significantly improve the quality of synthetic videos.
arXiv Detail & Related papers (2025-09-20T03:04:41Z)
TurboVSR: Fantastic Video Upscalers and Where to Find Them [33.83721799307721]
Diffusion-based generative models have demonstrated exceptional promise in the video super-resolution (VSR) task.<n>We present TurboVSR, an ultra-efficient diffusion-based video super-resolution model.<n>TurboVSR performs on par with state-of-the-art VSR methods, while being 100+ times faster, taking only 7 seconds to process a 2-second long 1080p video.
arXiv Detail & Related papers (2025-06-30T08:24:13Z)
FCA2: Frame Compression-Aware Autoencoder for Modular and Fast Compressed Video Super-Resolution [68.77813885751308]
State-of-the-art (SOTA) compressed video super-resolution (CVSR) models face persistent challenges, including prolonged inference time, complex training pipelines, and reliance on auxiliary information.<n>We propose an efficient and scalable solution inspired by the structural and statistical similarities between hyperspectral images (HSI) and video data.<n>Our approach introduces a compression-driven dimensionality reduction strategy that reduces computational complexity, accelerates inference, and enhances the extraction of temporal information across frames.
arXiv Detail & Related papers (2025-06-13T07:59:52Z)
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution [43.83739935393097]
We propose DOVE, an efficient one-step diffusion model for real-world video super-resolution.<n>DOVE is obtained by fine-tuning a pretrained video diffusion model (*i.e.*, CogVideoX)<n>Experiments show that DOVE exhibits comparable or superior performance to multi-step diffusion-based VSR methods.
arXiv Detail & Related papers (2025-05-22T05:16:45Z)
Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model.<n>We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch.<n>Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z)
Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability. To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance. To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR) temporalsynthesis and spatial super-resolution are intra-related in this task. We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.