Related papers: FCA2: Frame Compression-Aware Autoencoder for Modular and Fast Compressed Video Super-Resolution

FCA2: Frame Compression-Aware Autoencoder for Modular and Fast Compressed Video Super-Resolution

URL: http://arxiv.org/abs/2506.11545v1
Date: Fri, 13 Jun 2025 07:59:52 GMT
Title: FCA2: Frame Compression-Aware Autoencoder for Modular and Fast Compressed Video Super-Resolution
Authors: Zhaoyang Wang, Jie Li, Wen Lu, Lihuo He, Maoguo Gong, Xinbo Gao,
Abstract summary: State-of-the-art (SOTA) compressed video super-resolution (CVSR) models face persistent challenges, including prolonged inference time, complex training pipelines, and reliance on auxiliary information.<n>We propose an efficient and scalable solution inspired by the structural and statistical similarities between hyperspectral images (HSI) and video data.<n>Our approach introduces a compression-driven dimensionality reduction strategy that reduces computational complexity, accelerates inference, and enhances the extraction of temporal information across frames.
Score: 68.77813885751308
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art (SOTA) compressed video super-resolution (CVSR) models face persistent challenges, including prolonged inference time, complex training pipelines, and reliance on auxiliary information. As video frame rates continue to increase, the diminishing inter-frame differences further expose the limitations of traditional frame-to-frame information exploitation methods, which are inadequate for addressing current video super-resolution (VSR) demands. To overcome these challenges, we propose an efficient and scalable solution inspired by the structural and statistical similarities between hyperspectral images (HSI) and video data. Our approach introduces a compression-driven dimensionality reduction strategy that reduces computational complexity, accelerates inference, and enhances the extraction of temporal information across frames. The proposed modular architecture is designed for seamless integration with existing VSR frameworks, ensuring strong adaptability and transferability across diverse applications. Experimental results demonstrate that our method achieves performance on par with, or surpassing, the current SOTA models, while significantly reducing inference time. By addressing key bottlenecks in CVSR, our work offers a practical and efficient pathway for advancing VSR technology. Our code will be publicly available at https://github.com/handsomewzy/FCA2.

Related papers

VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing [0.13381749415517016]
We introduce VRWKV-Editor, a novel video editing model that integrates a linear-temporal aggregation module into video-based diffusion models.<n> VRWKV-Editor achieves up to 3.7x speedup and 60% lower memory usage compared to state-of-the-art diffusion-based video editing methods.
arXiv Detail & Related papers (2025-09-30T09:30:23Z)
DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer [56.98400572837792]
DiVE produces high-fidelity, temporally coherent, and cross-view consistent multi-view videos.<n>These innovations collectively achieve a 2.62x speedup with minimal quality degradation.
arXiv Detail & Related papers (2025-04-28T09:20:50Z)
DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations [25.756755602342942]
We present DiffVSR, featuring a Progressive Learning Strategy (PLS) that systematically decomposes this learning burden through staged training.<n>Our framework additionally incorporates an Interweaved Latent Transition (ILT) technique that maintains competitive temporal consistency without additional training overhead.
arXiv Detail & Related papers (2025-01-17T10:53:03Z)
Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval [16.497758750494537]
We propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism. We leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features. We introduce a new Resolution-Align Transformer Layer to capture global temporal correlations among frame features with different resolutions.
arXiv Detail & Related papers (2023-09-15T05:31:53Z)
A Codec Information Assisted Framework for Efficient Compressed Video Super-Resolution [15.690562510147766]
Video Super-Resolution (VSR) using recurrent neural network architecture is a promising solution due to its efficient modeling of long-range temporal dependencies. We propose a Codec Information Assisted Framework (CIAF) to boost and accelerate recurrent VSR models for compressed videos.
arXiv Detail & Related papers (2022-10-15T08:48:29Z)
Learned Video Compression via Heterogeneous Deformable Compensation Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance. More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets. Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z)
STDAN: Deformable Attention Network for Space-Time Video Super-Resolution [39.18399652834573]
We propose a deformable attention network called STDAN for STVSR. First, we devise a long-short term feature (LSTFI) module, which is capable of abundant content from more neighboring input frames. Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts are adaptively captured and aggregated.
arXiv Detail & Related papers (2022-03-14T03:40:35Z)
Fast Online Video Super-Resolution with Deformable Attention Pyramid [172.16491820970646]
Video super-resolution (VSR) has many applications that pose strict causal, real-time, and latency constraints, including video streaming and TV. We propose a recurrent VSR architecture based on a deformable attention pyramid (DAP)
arXiv Detail & Related papers (2022-02-03T17:49:04Z)
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR) temporalsynthesis and spatial super-resolution are intra-related in this task. We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)
Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN) In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.