SkipSR: Faster Super Resolution with Token Skipping
- URL: http://arxiv.org/abs/2510.08799v1
- Date: Thu, 09 Oct 2025 20:27:11 GMT
- Title: SkipSR: Faster Super Resolution with Token Skipping
- Authors: Rohan Choudhury, Shanchuan Lin, Jianyi Wang, Hao Chen, Qi Zhao, Feng Cheng, Lu Jiang, Kris Kitani, Laszlo A. Jeni,
- Abstract summary: Diffusion-based super-resolution (SR) is a key component in video generation and video restoration, but is slow and expensive.<n>We propose SkipSR, a framework for accelerating video SR by identifying low-detail regions directly from low-resolution input.<n>In standard SR benchmarks, our method achieves up to 60% faster end-to-end latency than prior models on 720p videos with no perceptible loss in quality.
- Score: 46.407256877675565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion-based super-resolution (SR) is a key component in video generation and video restoration, but is slow and expensive, limiting scalability to higher resolutions and longer videos. Our key insight is that many regions in video are inherently low-detail and gain little from refinement, yet current methods process all pixels uniformly. To take advantage of this, we propose SkipSR, a simple framework for accelerating video SR by identifying low-detail regions directly from low-resolution input, then skipping computation on them entirely, only super-resolving the areas that require refinement. This simple yet effective strategy preserves perceptual quality in both standard and one-step diffusion SR models while significantly reducing computation. In standard SR benchmarks, our method achieves up to 60% faster end-to-end latency than prior models on 720p videos with no perceptible loss in quality. Video demos are available at https://rccchoudhury.github.io/skipsr/
Related papers
- PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution [65.09345929328586]
We propose an innovative approach, called PatchVSR, which integrates a dual-stream adapter for conditional guidance.<n> Experiments demonstrate that our method can synthesize high-fidelity, high-resolution details at the patch level.<n>We can achieve highly competitive 4K VSR based on a 512x512 resolution base model, with extremely high efficiency.
arXiv Detail & Related papers (2025-09-30T09:55:14Z) - Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching [57.7533917467934]
EasyCache is a training-free acceleration framework for video diffusion models.<n>We conduct comprehensive studies on various large-scale video generation models, including OpenSora, Wan2.1, and HunyuanVideo.<n>Our method achieves leading acceleration performance, reducing inference time by up to 2.1-3.3$times$ compared to the original baselines.
arXiv Detail & Related papers (2025-07-03T17:59:54Z) - TurboVSR: Fantastic Video Upscalers and Where to Find Them [33.83721799307721]
Diffusion-based generative models have demonstrated exceptional promise in the video super-resolution (VSR) task.<n>We present TurboVSR, an ultra-efficient diffusion-based video super-resolution model.<n>TurboVSR performs on par with state-of-the-art VSR methods, while being 100+ times faster, taking only 7 seconds to process a 2-second long 1080p video.
arXiv Detail & Related papers (2025-06-30T08:24:13Z) - RTSR: A Real-Time Super-Resolution Model for AV1 Compressed Content [10.569678424799616]
Super-resolution (SR) is a key technique for improving the visual quality of video content.
To support real-time playback, it is important to implement fast SR models while preserving reconstruction quality.
This paper proposes a low-complexity SR method, RTSR, designed to enhance the visual quality of compressed video content.
arXiv Detail & Related papers (2024-11-20T14:36:06Z) - SR+Codec: a Benchmark of Super-Resolution for Video Compression Bitrate Reduction [0.0]
We developed a benchmark to analyze Super-Resolution's capacity to upscale compressed videos.<n>Our dataset employed video codecs based on five widely-used compression standards.<n>We found that some SR models, combined with compression, allow us to reduce the video without significant loss of quality.
arXiv Detail & Related papers (2023-05-08T16:42:55Z) - SuperTran: Reference Based Video Transformer for Enhancing Low Bitrate
Streams in Real Time [0.6308539010172309]
This work focuses on low video streaming scenarios (e.g. 50 - 200Kbps) where the video quality is severely compromised.
We present a family of novel deep generative models for enhancing perceptual video quality of such streams by performing super-resolution while also removing compression artifacts.
Our model, which we call SuperTran, consumes as input a single high-quality, high-resolution reference images in addition to the low-quality, low-resolution video stream.
arXiv Detail & Related papers (2022-11-22T22:03:11Z) - Deep Parametric 3D Filters for Joint Video Denoising and Illumination
Enhancement in Video Super Resolution [96.89588203312451]
This paper presents a new parametric representation called Deep Parametric 3D Filters (DP3DF)
DP3DF incorporates local information to enable simultaneous denoising, illumination enhancement, and SR efficiently in a single encoder-and-decoder network.
Also, a dynamic residual frame is jointly learned with the DP3DF via a shared backbone to further boost the SR quality.
arXiv Detail & Related papers (2022-07-05T03:57:25Z) - VideoINR: Learning Video Implicit Neural Representation for Continuous
Space-Time Super-Resolution [75.79379734567604]
We show that Video Implicit Neural Representation (VideoINR) can be decoded to videos of arbitrary spatial resolution and frame rate.
We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales.
arXiv Detail & Related papers (2022-06-09T17:45:49Z) - Learning Trajectory-Aware Transformer for Video Super-Resolution [50.49396123016185]
Video super-resolution aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts.
Existing approaches usually align and aggregate video frames from limited adjacent frames.
We propose a novel Transformer for Video Super-Resolution (TTVSR)
arXiv Detail & Related papers (2022-04-08T03:37:39Z) - Memory-Augmented Non-Local Attention for Video Super-Resolution [61.55700315062226]
We propose a novel video super-resolution method that aims at generating high-fidelity high-resolution (HR) videos from low-resolution (LR) ones.
Previous methods predominantly leverage temporal neighbor frames to assist the super-resolution of the current frame.
In contrast, we devise a cross-frame non-local attention mechanism that allows video super-resolution without frame alignment.
arXiv Detail & Related papers (2021-08-25T05:12:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.