Related papers: Space-Time Video Super-resolution with Neural Operator

Space-Time Video Super-resolution with Neural Operator

URL: http://arxiv.org/abs/2404.06036v1
Date: Tue, 9 Apr 2024 05:49:04 GMT
Title: Space-Time Video Super-resolution with Neural Operator
Authors: Yuantong Zhang, Hanyou Zheng, Daiqin Yang, Zhenzhong Chen, Haichuan Ma, Wenpeng Ding,
Abstract summary: This paper addresses the task of space-time video super-resolution (ST-MEMVSR) Inspired by recent progress in physics-informed neural networks, we model the challenges of MEMC in ST-VSR. Our approach transforms independent lowresolution representations in coarse-grained continuous function space into refined representations with enriched-temporal details in the fine-grained continuous function space.
Score: 36.715371608285025
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper addresses the task of space-time video super-resolution (ST-VSR). Existing methods generally suffer from inaccurate motion estimation and motion compensation (MEMC) problems for large motions. Inspired by recent progress in physics-informed neural networks, we model the challenges of MEMC in ST-VSR as a mapping between two continuous function spaces. Specifically, our approach transforms independent low-resolution representations in the coarse-grained continuous function space into refined representations with enriched spatiotemporal details in the fine-grained continuous function space. To achieve efficient and accurate MEMC, we design a Galerkin-type attention function to perform frame alignment and temporal interpolation. Due to the linear complexity of the Galerkin-type attention mechanism, our model avoids patch partitioning and offers global receptive fields, enabling precise estimation of large motions. The experimental results show that the proposed method surpasses state-of-the-art techniques in both fixed-size and continuous space-time video super-resolution tasks.

Related papers

Event-based Visual Deformation Measurement [76.25283405575108]
Visual Deformation Measurement aims to recover dense deformation fields by tracking surface motion from camera observations.<n>Traditional image-based methods rely on minimal inter-frame motion to constrain the correspondence search space.<n>We propose an event-frame fusion framework that exploits events for temporally dense motion cues and frames for spatially dense precise estimation.
arXiv Detail & Related papers (2026-02-16T01:04:48Z)
FunPhase: A Periodic Functional Autoencoder for Motion Generation via Phase Manifolds [2.6041136107390037]
We introduce FunPhase, a functional periodic autoencoder that learns a phase manifold for motion and replaces discrete temporal decoding with a function-space formulation.<n>FunPhase supports downstream tasks such as super-resolution and partial-body motion completion, generalizes across skeletons and datasets, and unifies motion prediction and generation within a single interpretable manifold.
arXiv Detail & Related papers (2025-12-10T08:46:53Z)
Towards Robust and Generalizable Continuous Space-Time Video Super-Resolution with Events [71.2439653098351]
Continuous space-time video super-STVSR has garnered increasing interest for its capability to reconstruct high-resolution and high-frame-rate videos at arbitrary temporal scales.<n>We present EvEnhancer, a novel approach that marries unique properties of high temporal and high dynamic range encapsulated in event streams.<n>Our method achieves state-of-the-art performance on both synthetic and real-world datasets, while maintaining generalizability at OOD scales.
arXiv Detail & Related papers (2025-10-04T15:23:07Z)
SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models [42.814012901180774]
textbfSAMPO is a hybrid framework that combines visual autoregressive modeling for intra-frame generation with causal modeling for next-frame generation.<n>We show that SAMPO achieves competitive performance in action-conditioned video prediction and model-based control.<n>We also evaluate SAMPO's zero-shot generalization and scaling behavior, demonstrating its ability to generalize to unseen tasks.
arXiv Detail & Related papers (2025-09-19T02:41:37Z)
Holistic Surgical Phase Recognition with Hierarchical Input Dependent State Space Models [56.2236083600999]
We propose a novel hierarchical input-dependent state space model for surgical video analysis.<n>Our framework incorporates a temporally consistent visual feature extractor, which appends a state space model head to a visual feature extractor to propagate temporal information.<n> Experiments have shown that our method outperforms the current state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2025-06-26T14:43:57Z)
EMoTive: Event-guided Trajectory Modeling for 3D Motion Estimation [59.33052312107478]
Event cameras offer possibilities for 3D motion estimation through continuous adaptive pixel-level responses to scene changes. This paper presents EMove, a novel event-based framework that models-uniform trajectories via event-guided parametric curves. For motion representation, we introduce a density-aware adaptation mechanism to fuse spatial and temporal features under event guidance. The final 3D motion estimation is achieved through multi-temporal sampling of parametric trajectories, flows and depth motion fields.
arXiv Detail & Related papers (2025-03-14T13:15:54Z)
Lagrangian Motion Fields for Long-term Motion Generation [51.02126882968116]
We introduce the concept of Lagrangian Motion Fields, specifically designed for long-term motion generation.<n>By treating each joint as a Lagrangian particle with uniform velocity over short intervals, our approach condenses motion representations into a series of "supermotions"<n>Our solution is versatile and lightweight, eliminating the need for neural network preprocessing.
arXiv Detail & Related papers (2024-09-03T01:38:06Z)
Surgformer: Surgical Transformer with Hierarchical Temporal Attention for Surgical Phase Recognition [7.682613953680041]
We propose the Surgical Transformer (Surgformer) to address the issues of spatial-temporal modeling and redundancy in an end-to-end manner. We show that our proposed Surgformer performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-07T16:16:31Z)
Multiscale Motion-Aware and Spatial-Temporal-Channel Contextual Coding Network for Learned Video Compression [24.228981098990726]
We propose a motion-aware and spatial-temporal-channel contextual coding based video compression network (MASTC-VC) Our proposed MASTC-VC is surprior to previous state-of-the-art (SOTA) methods on three public benchmark datasets. Our method brings average 10.15% BD-rate savings against H.265/HEVC (HM-16.20) in PSNR metric and average 23.93% BD-rate savings against H.266/VVC (VTM-13.2) in MS-SSIM metric.
arXiv Detail & Related papers (2023-10-19T13:32:38Z)
Local-Global Temporal Difference Learning for Satellite Video Super-Resolution [55.69322525367221]
We propose to exploit the well-defined temporal difference for efficient and effective temporal compensation. To fully utilize the local and global temporal information within frames, we systematically modeled the short-term and long-term temporal discrepancies. Rigorous objective and subjective evaluations conducted across five mainstream video satellites demonstrate that our method performs favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2023-04-10T07:04:40Z)
Enhancing Space-time Video Super-resolution via Spatial-temporal Feature Interaction [9.456643513690633]
The aim of space-time video super-resolution (STVSR) is to increase both the frame rate and the spatial resolution of a video. Recent approaches solve STVSR using end-to-end deep neural networks. We propose a spatial-temporal feature interaction network to enhance STVSR by exploiting both spatial and temporal correlations.
arXiv Detail & Related papers (2022-07-18T22:10:57Z)
Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling [105.69197687940505]
We propose to explore the role of explicit temporal difference modeling in both LR and HR space. To further enhance the super-resolution result, not only spatial residual features are extracted, but the difference between consecutive frames in high-frequency domain is also computed.
arXiv Detail & Related papers (2022-04-14T17:07:33Z)
Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations. To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video. In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z)
MEGAN: Memory Enhanced Graph Attention Network for Space-Time Video Super-Resolution [8.111645835455658]
Space-time video super-resolution (STVSR) aims to construct a high space-time resolution video sequence from the corresponding low-frame-rate, low-resolution video sequence. Inspired by the recent success to consider spatial-temporal information for space-time super-resolution, our main goal in this work is to take full considerations of spatial and temporal correlations.
arXiv Detail & Related papers (2021-10-28T17:37:07Z)
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
A Spatial-Temporal Attentive Network with Spatial Continuity for Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC) First, spatial-temporal attention mechanism is presented to explore the most useful and important information. Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.