PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video
Prediction
- URL: http://arxiv.org/abs/2305.11421v2
- Date: Wed, 24 May 2023 07:00:38 GMT
- Title: PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video
Prediction
- Authors: Hao Wu, Wei Xiong, Fan Xu, Xiao Luo, Chong Chen, Xian-Sheng Hua and
Haixin Wang
- Abstract summary: We investigate the challenge of of-temporal video prediction, which involves generating future videos on historical data streams.
We introduce a novel approach called Spatio-temporal Network (PastNet) for generating high-quality predictions.
We employ a memory bank with the estimated intrinsic dimensionality to discretize local features during the processing of complex-temporal signals.
- Score: 33.25800277291283
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we investigate the challenge of spatio-temporal video
prediction, which involves generating future videos based on historical data
streams. Existing approaches typically utilize external information such as
semantic maps to enhance video prediction, which often neglect the inherent
physical knowledge embedded within videos. Furthermore, their high
computational demands could impede their applications for high-resolution
videos. To address these constraints, we introduce a novel approach called
Physics-assisted Spatio-temporal Network (PastNet) for generating high-quality
video predictions. The core of our PastNet lies in incorporating a spectral
convolution operator in the Fourier domain, which efficiently introduces
inductive biases from the underlying physical laws. Additionally, we employ a
memory bank with the estimated intrinsic dimensionality to discretize local
features during the processing of complex spatio-temporal signals, thereby
reducing computational costs and facilitating efficient high-resolution video
prediction. Extensive experiments on various widely-used datasets demonstrate
the effectiveness and efficiency of the proposed PastNet compared with
state-of-the-art methods, particularly in high-resolution scenarios. Our code
is available at https://github.com/easylearningscores/PastNet.
Related papers
- Lightweight Stochastic Video Prediction via Hybrid Warping [10.448675566568086]
Accurate video prediction by deep neural networks, especially for dynamic regions, is a challenging task in computer vision for critical applications such as autonomous driving, remote working, and telemedicine.
We propose a novel long-term complexity video prediction model that focuses on dynamic regions by employing a hybrid warping strategy.
Considering real-time predictions, we introduce a MobileNet-based lightweight architecture into our model.
arXiv Detail & Related papers (2024-12-04T06:33:27Z) - Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting [17.530885640317372]
We propose a novel prompt tuning-based continuous forecasting method.
Specifically, we integrate the base-temporal graph neural network with a continuous prompt pool stored in memory.
This method ensures that the model sequentially learns from the widespread-temporal data stream to accomplish tasks for corresponding periods.
arXiv Detail & Related papers (2024-10-16T14:12:11Z) - HAVANA: Hierarchical stochastic neighbor embedding for Accelerated Video ANnotAtions [59.71751978599567]
This paper presents a novel annotation pipeline that uses pre-extracted features and dimensionality reduction to accelerate the temporal video annotation process.
We demonstrate significant improvements in annotation effort compared to traditional linear methods, achieving more than a 10x reduction in clicks required for annotating over 12 hours of video.
arXiv Detail & Related papers (2024-09-16T18:15:38Z) - Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - Spatial Decomposition and Temporal Fusion based Inter Prediction for
Learned Video Compression [59.632286735304156]
We propose a spatial decomposition and temporal fusion based inter prediction for learned video compression.
With the SDD-based motion model and long short-term temporal fusion, our proposed learned video can obtain more accurate inter prediction contexts.
arXiv Detail & Related papers (2024-01-29T03:30:21Z) - VS-Net: Multiscale Spatiotemporal Features for Lightweight Video Salient
Document Detection [0.2578242050187029]
We propose VS-Net, which captures multi-scaletemporal information with the help of dilated depth-wise separable convolution and Approximation Rank Pooling.
Our model generates saliency maps considering both the background and foreground, making it perform better in challenging scenarios.
The immense experiments regulated on the benchmark MIDV-500 dataset show that the VS-Net model outperforms state-of-the-art approaches in both time and robustness measures.
arXiv Detail & Related papers (2023-01-11T13:07:31Z) - STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond [78.129039340528]
We propose a temporal-aware unit (STAU) for video prediction and beyond.
Our STAU can outperform other methods on all tasks in terms of performance and efficiency.
arXiv Detail & Related papers (2022-04-20T13:42:51Z) - Borrowing from yourself: Faster future video segmentation with partial
channel update [0.0]
We propose to tackle the task of fast future video segmentation prediction through the use of convolutional layers with time-dependent channel masking.
This technique only updates a chosen subset of the feature maps at each time-step, bringing simultaneously less computation and latency.
We apply this technique to several fast architectures and experimentally confirm its benefits for the future prediction subtask.
arXiv Detail & Related papers (2022-02-11T16:37:53Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Intrinsic Temporal Regularization for High-resolution Human Video
Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain.
We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation.
We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.