Related papers: Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

URL: http://arxiv.org/abs/2512.23709v1
Date: Mon, 29 Dec 2025 18:59:57 GMT
Title: Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion
Authors: Hau-Shiang Shiu, Chin-Yang Lin, Zhixiang Wang, Chi-Wei Hsiao, Po-Fan Yu, Yu-Chih Chen, Yu-Lun Liu,
Abstract summary: Stream-DiffVSR is a causally conditioned diffusion framework for efficient online VSR.<n>It processes 720p frames in 0.328 seconds on an GTX4090 GPU.<n>It boosts perceptual quality (LPIPS +0.095) while reducing latency by over 130x.
Score: 10.847237180991948
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion-based video super-resolution (VSR) methods achieve strong perceptual quality but remain impractical for latency-sensitive settings due to reliance on future frames and expensive multi-step denoising. We propose Stream-DiffVSR, a causally conditioned diffusion framework for efficient online VSR. Operating strictly on past frames, it combines a four-step distilled denoiser for fast inference, an Auto-regressive Temporal Guidance (ARTG) module that injects motion-aligned cues during latent denoising, and a lightweight temporal-aware decoder with a Temporal Processor Module (TPM) that enhances detail and temporal coherence. Stream-DiffVSR processes 720p frames in 0.328 seconds on an RTX4090 GPU and significantly outperforms prior diffusion-based methods. Compared with the online SOTA TMP, it boosts perceptual quality (LPIPS +0.095) while reducing latency by over 130x. Stream-DiffVSR achieves the lowest latency reported for diffusion-based VSR, reducing initial delay from over 4600 seconds to 0.328 seconds, thereby making it the first diffusion VSR method suitable for low-latency online deployment. Project page: https://jamichss.github.io/stream-diffvsr-project-page/

Related papers

Rethinking Diffusion Model-Based Video Super-Resolution: Leveraging Dense Guidance from Aligned Features [51.5076190312734]
Video Super-Resolution approaches suffer from error accumulation, spatial artifacts, and a trade-off between perceptual quality and fidelity.<n>We propose a novelly Guided diffusion model with Aligned Features for Video Super-Resolution (DGAF-VSR)<n>Experiments on synthetic and real-world datasets demonstrate that DGAF-VSR surpasses state-of-the-art methods in key aspects of VSR.
arXiv Detail & Related papers (2025-11-21T03:40:45Z)
StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation [65.90400162290057]
Generative models are reshaping the live-streaming industry by redefining how content is created, styled, and delivered.<n>Recent advances in video diffusion have markedly improved temporal consistency and sampling efficiency for offline generation.<n>Live online streaming operates under strict service-level objectives (SLOs): time-to-first-frame must be minimal, and every frame must meet a per-frame deadline with low jitter.
arXiv Detail & Related papers (2025-11-10T18:51:28Z)
Diffusion Buffer for Online Generative Speech Enhancement [32.98694610706198]
Diffusion Buffer is a generative diffusion-based Speech Enhancement model.<n>It only requires one neural network call per incoming signal frame from a stream of data.<n>It performs enhancement in an online fashion on a consumer-grade GPU.
arXiv Detail & Related papers (2025-10-21T15:52:33Z)
FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution [61.284842030283464]
FlashVSR is the first diffusion-based one-step streaming framework towards real-time VSR.<n>It runs at approximately 17 FPS for 768x1408 videos on a single A100 GPU.<n>It scales reliably to ultra-high resolutions and achieves state-of-the-art performance with up to 12x speedup over prior one-step diffusion VSR models.
arXiv Detail & Related papers (2025-10-14T17:25:54Z)
Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency [29.58683554898725]
We adapt a sliding window diffusion framework to the speech enhancement task.<n>Our approach corrupts speech signals through time, assigning more noise to frames close to the present in a buffer.<n>This marks the first practical diffusion-based solution for online speech enhancement.
arXiv Detail & Related papers (2025-06-03T14:14:28Z)
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation [52.56469577812338]
We introduce StreamDiffusion, a real-time diffusion pipeline for interactive image generation.<n>Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction.<n>We present a novel approach that transforms the original sequential denoising into the denoising process.
arXiv Detail & Related papers (2023-12-19T18:18:33Z)
Real-time Streaming Video Denoising with Bidirectional Buffers [48.57108807146537]
Real-time denoising algorithms are typically adopted on the user device to remove the noise involved during the shooting and transmission of video streams. Recent multi-output inference works propagate the bidirectional temporal feature with a parallel or recurrent framework. We propose a Bidirectional Streaming Video Denoising framework, to achieve high-fidelity real-time denoising for streaming videos with both past and future temporal receptive fields.
arXiv Detail & Related papers (2022-07-14T14:01:03Z)
Optical-Flow-Reuse-Based Bidirectional Recurrent Network for Space-Time Video Super-Resolution [52.899234731501075]
Space-time video super-resolution (ST-VSR) simultaneously increases the spatial resolution and frame rate for a given video. Existing methods typically suffer from difficulties in how to efficiently leverage information from a large range of neighboring frames. We propose a coarse-to-fine bidirectional recurrent neural network instead of using ConvLSTM to leverage knowledge between adjacent frames.
arXiv Detail & Related papers (2021-10-13T15:21:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.