Continuous Space-Time Video Super-Resolution with 3D Fourier Fields
- URL: http://arxiv.org/abs/2509.26325v1
- Date: Tue, 30 Sep 2025 14:34:02 GMT
- Title: Continuous Space-Time Video Super-Resolution with 3D Fourier Fields
- Authors: Alexander Becker, Julius Erbach, Dominik Narnhofer, Konrad Schindler,
- Abstract summary: We introduce a novel formulation for continuous space-time video super-resolution.<n>We show that our modeling joint substantially improves both spatial and temporal super-resolution.
- Score: 62.270473766381976
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel formulation for continuous space-time video super-resolution. Instead of decoupling the representation of a video sequence into separate spatial and temporal components and relying on brittle, explicit frame warping for motion compensation, we encode video as a continuous, spatio-temporally coherent 3D Video Fourier Field (VFF). That representation offers three key advantages: (1) it enables cheap, flexible sampling at arbitrary locations in space and time; (2) it is able to simultaneously capture fine spatial detail and smooth temporal dynamics; and (3) it offers the possibility to include an analytical, Gaussian point spread function in the sampling to ensure aliasing-free reconstruction at arbitrary scale. The coefficients of the proposed, Fourier-like sinusoidal basis are predicted with a neural encoder with a large spatio-temporal receptive field, conditioned on the low-resolution input video. Through extensive experiments, we show that our joint modeling substantially improves both spatial and temporal super-resolution and sets a new state of the art for multiple benchmarks: across a wide range of upscaling factors, it delivers sharper and temporally more consistent reconstructions than existing baselines, while being computationally more efficient. Project page: https://v3vsr.github.io.
Related papers
- FreeTimeGS: Free Gaussian Primitives at Anytime and Anywhere for Dynamic Scene Reconstruction [64.30050475414947]
FreeTimeGS is a novel 4D representation that allows Gaussian primitives to appear at arbitrary time and locations.<n>Our representation possesses the strong flexibility, thus improving the ability to model dynamic 3D scenes.<n> Experiments results on several datasets show that the rendering quality of our method outperforms recent methods by a large margin.
arXiv Detail & Related papers (2025-06-05T17:59:57Z) - Seeing World Dynamics in a Nutshell [132.79736435144403]
NutWorld is a framework that transforms monocular videos into dynamic 3D representations in a single forward pass.<n>We demonstrate that NutWorld achieves high-fidelity video reconstruction quality while enabling downstream applications in real-time.
arXiv Detail & Related papers (2025-02-05T18:59:52Z) - BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution [14.082598088990352]
We propose BF-STVSR, a C-STVSR framework with two key modules tailored to better represent spatial and temporal characteristics of video.<n>Our approach achieves state-of-the-art in various metrics, including PSNR and SSIM, showing enhanced spatial details and natural temporal consistency.
arXiv Detail & Related papers (2025-01-19T13:29:41Z) - Fast Fourier Inception Networks for Occluded Video Prediction [16.99757795577547]
Video prediction is a pixel-level task that generates future frames by employing the historical frames.
We develop the fully convolutional Fast Fourier Networks for video prediction, termed itFFINet, which includes two primary components, ie, the occlusion inpainter and the translator.
arXiv Detail & Related papers (2023-06-17T13:27:29Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Towards Interpretable Video Super-Resolution via Alternating
Optimization [115.85296325037565]
We study a practical space-time video super-resolution (STVSR) problem which aims at generating a high-framerate high-resolution sharp video from a low-framerate blurry video.
We propose an interpretable STVSR framework by leveraging both model-based and learning-based methods.
arXiv Detail & Related papers (2022-07-21T21:34:05Z) - Spatio-Temporal Self-Attention Network for Video Saliency Prediction [13.873682190242365]
3D convolutional neural networks have achieved promising results for video tasks in computer vision.
We propose a novel Spatio-Temporal Self-Temporal Self-Attention 3 Network (STSANet) for video saliency prediction.
arXiv Detail & Related papers (2021-08-24T12:52:47Z) - Temporal-Spatial Feature Pyramid for Video Saliency Detection [2.578242050187029]
We propose a 3D fully convolutional encoder-decoder architecture for video saliency detection.
Our model is simple yet effective, and can run in real time.
arXiv Detail & Related papers (2021-05-10T09:14:14Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.