Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content
- URL: http://arxiv.org/abs/2602.11339v2
- Date: Sat, 14 Feb 2026 13:58:35 GMT
- Title: Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content
- Authors: Evgeney Bogatyrev, Khaled Abud, Ivan Molodetskikh, Nikita Alutis, Dmitriy Vatolin,
- Abstract summary: We introduce a comprehensive dataset - StreamSR - sourced from YouTube.<n>We benchmark 11 state-of-the-art real-time super-resolution models to evaluate their performance for the streaming use-case.<n>We propose EfRLFN, an efficient real-time model that integrates Efficient Channel Attention and a hyperbolic tangent activation function.
- Score: 1.1926989512978123
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in real-time super-resolution have enabled higher-quality video streaming, yet existing methods struggle with the unique challenges of compressed video content. Commonly used datasets do not accurately reflect the characteristics of streaming media, limiting the relevance of current benchmarks. To address this gap, we introduce a comprehensive dataset - StreamSR - sourced from YouTube, covering a wide range of video genres and resolutions representative of real-world streaming scenarios. We benchmark 11 state-of-the-art real-time super-resolution models to evaluate their performance for the streaming use-case. Furthermore, we propose EfRLFN, an efficient real-time model that integrates Efficient Channel Attention and a hyperbolic tangent activation function - a novel design choice in the context of real-time super-resolution. We extensively optimized the architecture to maximize efficiency and designed a composite loss function that improves training convergence. EfRLFN combines the strengths of existing architectures while improving both visual quality and runtime performance. Finally, we show that fine-tuning other models on our dataset results in significant performance gains that generalize well across various standard benchmarks. We made the dataset, the code, and the benchmark available at https://github.com/EvgeneyBogatyrev/EfRLFN.
Related papers
- SCENE: Semantic-aware Codec Enhancement with Neural Embeddings [9.6489283335586]
We propose a semantic-aware pre-processing framework that enhances perceptual fidelity.<n>Our method integrates semantic embeddings from a vision-language model into an efficient convolutional architecture.<n>Our results show that semantic-aware pre-processing is an effective approach for enhancing compressed video streams.
arXiv Detail & Related papers (2026-01-29T05:41:28Z) - Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs [54.502280390499756]
We propose TimeWarp to create a targeted synthetic temporal dataset to fine-tune the model's responses to encourage it to focus on the given input video.<n>We demonstrate that when our method is applied to existing models, it significantly improves performance on temporal understanding benchmarks.
arXiv Detail & Related papers (2025-10-04T21:48:40Z) - Towards Robust and Generalizable Continuous Space-Time Video Super-Resolution with Events [71.2439653098351]
Continuous space-time video super-STVSR has garnered increasing interest for its capability to reconstruct high-resolution and high-frame-rate videos at arbitrary temporal scales.<n>We present EvEnhancer, a novel approach that marries unique properties of high temporal and high dynamic range encapsulated in event streams.<n>Our method achieves state-of-the-art performance on both synthetic and real-world datasets, while maintaining generalizability at OOD scales.
arXiv Detail & Related papers (2025-10-04T15:23:07Z) - FCA2: Frame Compression-Aware Autoencoder for Modular and Fast Compressed Video Super-Resolution [68.77813885751308]
State-of-the-art (SOTA) compressed video super-resolution (CVSR) models face persistent challenges, including prolonged inference time, complex training pipelines, and reliance on auxiliary information.<n>We propose an efficient and scalable solution inspired by the structural and statistical similarities between hyperspectral images (HSI) and video data.<n>Our approach introduces a compression-driven dimensionality reduction strategy that reduces computational complexity, accelerates inference, and enhances the extraction of temporal information across frames.
arXiv Detail & Related papers (2025-06-13T07:59:52Z) - Video Dataset Condensation with Diffusion Models [7.44997213284633]
Video dataset distillation is a promising solution to generate a compact synthetic dataset that retains the essential information from a large real dataset.<n>In this paper, we focus on video dataset distillation by employing a video diffusion model to generate high-quality synthetic videos.<n>To enhance representativeness, we introduce Video Spatio-Temporal U-Net (VST-UNet), a model designed to select a diverse and informative subset of videos.<n>We validate the effectiveness of our approach through extensive experiments on four benchmark datasets, demonstrating performance improvements of up to (10.61%) over the state-of-the
arXiv Detail & Related papers (2025-05-10T15:12:19Z) - Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams [78.72965584414368]
We present Flash-VStream, a video-language model that simulates the memory mechanism of human.
Compared to existing models, Flash-VStream achieves significant reductions in latency inference and VRAM consumption.
We propose VStream-QA, a novel question answering benchmark specifically designed for online video streaming understanding.
arXiv Detail & Related papers (2024-06-12T11:07:55Z) - Inflation with Diffusion: Efficient Temporal Adaptation for
Text-to-Video Super-Resolution [19.748048455806305]
We propose an efficient diffusion-based text-to-video super-resolution (SR) tuning approach.
We investigate different tuning approaches based on our inflated architecture and report trade-offs between computational costs and super-resolution quality.
arXiv Detail & Related papers (2024-01-18T22:25:16Z) - Differentiable Resolution Compression and Alignment for Efficient Video
Classification and Retrieval [16.497758750494537]
We propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism.
We leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features.
We introduce a new Resolution-Align Transformer Layer to capture global temporal correlations among frame features with different resolutions.
arXiv Detail & Related papers (2023-09-15T05:31:53Z) - Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability.
To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance.
To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z) - Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.