PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video
Prediction
- URL: http://arxiv.org/abs/2305.11421v2
- Date: Wed, 24 May 2023 07:00:38 GMT
- Title: PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video
Prediction
- Authors: Hao Wu, Wei Xiong, Fan Xu, Xiao Luo, Chong Chen, Xian-Sheng Hua and
Haixin Wang
- Abstract summary: We investigate the challenge of of-temporal video prediction, which involves generating future videos on historical data streams.
We introduce a novel approach called Spatio-temporal Network (PastNet) for generating high-quality predictions.
We employ a memory bank with the estimated intrinsic dimensionality to discretize local features during the processing of complex-temporal signals.
- Score: 33.25800277291283
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we investigate the challenge of spatio-temporal video
prediction, which involves generating future videos based on historical data
streams. Existing approaches typically utilize external information such as
semantic maps to enhance video prediction, which often neglect the inherent
physical knowledge embedded within videos. Furthermore, their high
computational demands could impede their applications for high-resolution
videos. To address these constraints, we introduce a novel approach called
Physics-assisted Spatio-temporal Network (PastNet) for generating high-quality
video predictions. The core of our PastNet lies in incorporating a spectral
convolution operator in the Fourier domain, which efficiently introduces
inductive biases from the underlying physical laws. Additionally, we employ a
memory bank with the estimated intrinsic dimensionality to discretize local
features during the processing of complex spatio-temporal signals, thereby
reducing computational costs and facilitating efficient high-resolution video
prediction. Extensive experiments on various widely-used datasets demonstrate
the effectiveness and efficiency of the proposed PastNet compared with
state-of-the-art methods, particularly in high-resolution scenarios. Our code
is available at https://github.com/easylearningscores/PastNet.
Related papers
- Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting [17.530885640317372]
We propose a novel prompt tuning-based continuous forecasting method.
Specifically, we integrate the base-temporal graph neural network with a continuous prompt pool stored in memory.
This method ensures that the model sequentially learns from the widespread-temporal data stream to accomplish tasks for corresponding periods.
arXiv Detail & Related papers (2024-10-16T14:12:11Z) - HAVANA: Hierarchical stochastic neighbor embedding for Accelerated Video ANnotAtions [59.71751978599567]
This paper presents a novel annotation pipeline that uses pre-extracted features and dimensionality reduction to accelerate the temporal video annotation process.
We demonstrate significant improvements in annotation effort compared to traditional linear methods, achieving more than a 10x reduction in clicks required for annotating over 12 hours of video.
arXiv Detail & Related papers (2024-09-16T18:15:38Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Viewport Prediction for Volumetric Video Streaming by Exploring Video Saliency and Trajectory Information [45.31198546289057]
This paper presents and proposes a novel approach, named Saliency and Trajectory Viewport Prediction (STVP)
It aims to improve the precision of viewport prediction in volumetric video streaming.
In particular, we introduce a novel sampling method, Uniform Random Sampling (URS), to reduce computational complexity.
arXiv Detail & Related papers (2023-11-28T03:45:29Z) - Lightweight Delivery Detection on Doorbell Cameras [9.735137325682825]
In this work we investigate an important home application, video based delivery detection, and present a simple lightweight pipeline for this task.
Our method relies on motionconstrained to generate a set of coarse activity cues followed by their classification with a mobile-friendly 3DCNN network.
arXiv Detail & Related papers (2023-05-13T01:28:28Z) - STIP: A SpatioTemporal Information-Preserving and Perception-Augmented
Model for High-Resolution Video Prediction [78.129039340528]
We propose a Stemporal Information-Preserving and Perception-Augmented Model (STIP) to solve the above two problems.
The proposed model aims to preserve thetemporal information for videos during the feature extraction and the state transitions.
Experimental results show that the proposed STIP can predict videos with more satisfactory visual quality compared with a variety of state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T09:49:04Z) - STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond [78.129039340528]
We propose a temporal-aware unit (STAU) for video prediction and beyond.
Our STAU can outperform other methods on all tasks in terms of performance and efficiency.
arXiv Detail & Related papers (2022-04-20T13:42:51Z) - Adversarial Memory Networks for Action Prediction [95.09968654228372]
Action prediction aims to infer the forthcoming human action with partially-observed videos.
We propose adversarial memory networks (AMemNet) to generate the "full video" feature conditioning on a partial video query.
arXiv Detail & Related papers (2021-12-18T08:16:21Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Non-linear State-space Model Identification from Video Data using Deep
Encoders [0.0]
We propose a novel non-linear state-space identification method starting from high-dimensional input and output data.
An encoder function, represented by a neural network, is introduced to learn a reconstructability map to estimate the model states from past inputs and outputs.
We apply the proposed method to a video stream of a simulated environment of a controllable ball in a unit box.
arXiv Detail & Related papers (2020-12-14T17:14:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.