Related papers: BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution

BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution

URL: http://arxiv.org/abs/2501.11043v2
Date: Tue, 25 Mar 2025 07:05:39 GMT
Title: BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution
Authors: Eunjin Kim, Hyeonjin Kim, Kyong Hwan Jin, Jaejun Yoo,
Abstract summary: We propose BF-STVSR, a C-STVSR framework with two key modules tailored to better represent spatial and temporal characteristics of video.<n>Our approach achieves state-of-the-art in various metrics, including PSNR and SSIM, showing enhanced spatial details and natural temporal consistency.
Score: 14.082598088990352
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While prior methods in Continuous Spatial-Temporal Video Super-Resolution (C-STVSR) employ Implicit Neural Representation (INR) for continuous encoding, they often struggle to capture the complexity of video data, relying on simple coordinate concatenation and pre-trained optical flow networks for motion representation. Interestingly, we find that adding position encoding, contrary to common observations, does not improve--and even degrades--performance. This issue becomes particularly pronounced when combined with pre-trained optical flow networks, which can limit the model's flexibility. To address these issues, we propose BF-STVSR, a C-STVSR framework with two key modules tailored to better represent spatial and temporal characteristics of video: 1) B-spline Mapper for smooth temporal interpolation, and 2) Fourier Mapper for capturing dominant spatial frequencies. Our approach achieves state-of-the-art in various metrics, including PSNR and SSIM, showing enhanced spatial details and natural temporal consistency. Our code is available https://github.com/Eunjnnn/bfstvsr.

Related papers

Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers [95.68243351895107]
We propose a holistic, video-centric paradigm named textbfLocal textbfDiffusion textbfForcing for textbfVideo textbfFrame textbfInterpolation (LDF-VFI)<n>Our framework is built upon an auto-regressive diffusion transformer that models the entire video sequence to ensure long-range temporal coherence.<n>LDF-VFI achieves state-of-the-art performance on challenging long-sequence benchmarks, demonstrating superior per
arXiv Detail & Related papers (2026-01-21T12:58:52Z)
Continuous Space-Time Video Super-Resolution with 3D Fourier Fields [62.270473766381976]
We introduce a novel formulation for continuous space-time video super-resolution.<n>We show that our modeling joint substantially improves both spatial and temporal super-resolution.
arXiv Detail & Related papers (2025-09-30T14:34:02Z)
Token-Efficient Long Video Understanding for Multimodal LLMs [101.70681093383365]
STORM is a novel architecture incorporating a dedicated temporal encoder between the image encoder and the Video-LLMs. We show that STORM achieves state-of-the-art results across various long video understanding benchmarks.
arXiv Detail & Related papers (2025-03-06T06:17:38Z)
SNeRV: Spectra-preserving Neural Representation for Video [8.978061470104532]
We propose spectra-preserving NeRV (SNeRV) as a novel approach to enhance implicit video representations. In this paper, we use 2D discrete wavelet transform (DWT) to decompose video into low-frequency (LF) and high-frequency (HF) features. We demonstrate that SNeRV outperforms existing NeRV models in capturing fine details and achieves enhanced reconstruction.
arXiv Detail & Related papers (2025-01-03T07:57:38Z)
HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera [22.208120663778043]
Continuous space-time super-resolution (C-STVSR) aims to simultaneously enhance resolution and frame rate at an arbitrary scale. We propose a novel C-STVSR framework, called HR-INR, which captures both holistic dependencies and regional motions based on implicit neural representation (INR) We then propose a novel INR-based decoder withtemporal embeddings to capture long-term dependencies with a larger temporal perception field.
arXiv Detail & Related papers (2024-05-22T06:51:32Z)
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution [151.1255837803585]
We propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo) for video super-resolution. SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction. Experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-25T17:59:26Z)
Boosting Neural Representations for Videos with a Conditional Decoder [28.073607937396552]
Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing. This paper introduces a universal boosting framework for current implicit video representation approaches.
arXiv Detail & Related papers (2024-02-28T08:32:19Z)
Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos [69.22032459870242]
We present a novel technique, Residual Radiance Field or ReRF, as a highly compact neural representation to achieve real-time free-view rendering on long-duration dynamic scenes. We show such a strategy can handle large motions without sacrificing quality. Based on ReRF, we design a special FVV that achieves three orders of magnitudes compression rate and provides a companion ReRF player to support online streaming of long-duration FVVs of dynamic scenes.
arXiv Detail & Related papers (2023-04-10T08:36:00Z)
You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query. Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames. We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z)
Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes. We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z)
Towards Interpretable Video Super-Resolution via Alternating Optimization [115.85296325037565]
We study a practical space-time video super-resolution (STVSR) problem which aims at generating a high-framerate high-resolution sharp video from a low-framerate blurry video. We propose an interpretable STVSR framework by leveraging both model-based and learning-based methods.
arXiv Detail & Related papers (2022-07-21T21:34:05Z)
Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis [40.249030338644225]
Video-to-Video synthesis (Vid2Vid) has achieved remarkable results in generating a photo-realistic video from a sequence of semantic maps. Fast-Vid2Vid achieves around real-time performance as 20 FPS and saves around 8x computational cost on a single V100 GPU.
arXiv Detail & Related papers (2022-07-11T17:57:57Z)
VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution [75.79379734567604]
We show that Video Implicit Neural Representation (VideoINR) can be decoded to videos of arbitrary spatial resolution and frame rate. We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales.
arXiv Detail & Related papers (2022-06-09T17:45:49Z)
STDAN: Deformable Attention Network for Space-Time Video Super-Resolution [39.18399652834573]
We propose a deformable attention network called STDAN for STVSR. First, we devise a long-short term feature (LSTFI) module, which is capable of abundant content from more neighboring input frames. Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts are adaptively captured and aggregated.
arXiv Detail & Related papers (2022-03-14T03:40:35Z)
Optical-Flow-Reuse-Based Bidirectional Recurrent Network for Space-Time Video Super-Resolution [52.899234731501075]
Space-time video super-resolution (ST-VSR) simultaneously increases the spatial resolution and frame rate for a given video. Existing methods typically suffer from difficulties in how to efficiently leverage information from a large range of neighboring frames. We propose a coarse-to-fine bidirectional recurrent neural network instead of using ConvLSTM to leverage knowledge between adjacent frames.
arXiv Detail & Related papers (2021-10-13T15:21:30Z)
C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer [5.220611885921671]
We propose Coarse-to-Fine Flow Warping Network (C2F-FWN) for spatial-temporal consistent HVMT. C2F-FWN employs Flow Temporal Consistency (FTC) Loss to enhance temporal consistency. Our approach outperforms state-of-art HVMT methods in terms of both spatial and temporal consistency.
arXiv Detail & Related papers (2020-12-16T14:11:13Z)
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR) temporalsynthesis and spatial super-resolution are intra-related in this task. We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.