PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video
Prediction
- URL: http://arxiv.org/abs/2305.11421v2
- Date: Wed, 24 May 2023 07:00:38 GMT
- Title: PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video
Prediction
- Authors: Hao Wu, Wei Xiong, Fan Xu, Xiao Luo, Chong Chen, Xian-Sheng Hua and
Haixin Wang
- Abstract summary: We investigate the challenge of of-temporal video prediction, which involves generating future videos on historical data streams.
We introduce a novel approach called Spatio-temporal Network (PastNet) for generating high-quality predictions.
We employ a memory bank with the estimated intrinsic dimensionality to discretize local features during the processing of complex-temporal signals.
- Score: 33.25800277291283
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we investigate the challenge of spatio-temporal video
prediction, which involves generating future videos based on historical data
streams. Existing approaches typically utilize external information such as
semantic maps to enhance video prediction, which often neglect the inherent
physical knowledge embedded within videos. Furthermore, their high
computational demands could impede their applications for high-resolution
videos. To address these constraints, we introduce a novel approach called
Physics-assisted Spatio-temporal Network (PastNet) for generating high-quality
video predictions. The core of our PastNet lies in incorporating a spectral
convolution operator in the Fourier domain, which efficiently introduces
inductive biases from the underlying physical laws. Additionally, we employ a
memory bank with the estimated intrinsic dimensionality to discretize local
features during the processing of complex spatio-temporal signals, thereby
reducing computational costs and facilitating efficient high-resolution video
prediction. Extensive experiments on various widely-used datasets demonstrate
the effectiveness and efficiency of the proposed PastNet compared with
state-of-the-art methods, particularly in high-resolution scenarios. Our code
is available at https://github.com/easylearningscores/PastNet.
Related papers
- FDDet: Frequency-Decoupling for Boundary Refinement in Temporal Action Detection [4.015022008487465]
Large-scale pre-trained video encoders tend to introduce background clutter and irrelevant semantics, leading to context confusion and boundaries.
We propose a frequency-aware decoupling network that improves action discriminability by filtering out noisy semantics captured by pre-trained models.
Our method achieves state-of-the-art performance on temporal action detection benchmarks.
arXiv Detail & Related papers (2025-04-01T10:57:37Z) - AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis [52.261173507177396]
We introduce AssistPDA, the first online video anomaly surveillance assistant (VAPDA) that unifies anomaly prediction, detection, and analysis (VAPDA) within a single framework.
AssistPDA enables real-time inference on streaming videos while supporting interactive user engagement.
We also introduce a novel event-level anomaly prediction task, enabling proactive anomaly forecasting before anomalies fully unfold.
arXiv Detail & Related papers (2025-03-27T18:30:47Z) - Lightweight Stochastic Video Prediction via Hybrid Warping [10.448675566568086]
Accurate video prediction by deep neural networks, especially for dynamic regions, is a challenging task in computer vision for critical applications such as autonomous driving, remote working, and telemedicine.
We propose a novel long-term complexity video prediction model that focuses on dynamic regions by employing a hybrid warping strategy.
Considering real-time predictions, we introduce a MobileNet-based lightweight architecture into our model.
arXiv Detail & Related papers (2024-12-04T06:33:27Z) - Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting [17.530885640317372]
We propose a novel prompt tuning-based continuous forecasting method.
Specifically, we integrate the base-temporal graph neural network with a continuous prompt pool stored in memory.
This method ensures that the model sequentially learns from the widespread-temporal data stream to accomplish tasks for corresponding periods.
arXiv Detail & Related papers (2024-10-16T14:12:11Z) - HAVANA: Hierarchical stochastic neighbor embedding for Accelerated Video ANnotAtions [59.71751978599567]
This paper presents a novel annotation pipeline that uses pre-extracted features and dimensionality reduction to accelerate the temporal video annotation process.
We demonstrate significant improvements in annotation effort compared to traditional linear methods, achieving more than a 10x reduction in clicks required for annotating over 12 hours of video.
arXiv Detail & Related papers (2024-09-16T18:15:38Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Spatial Decomposition and Temporal Fusion based Inter Prediction for
Learned Video Compression [59.632286735304156]
We propose a spatial decomposition and temporal fusion based inter prediction for learned video compression.
With the SDD-based motion model and long short-term temporal fusion, our proposed learned video can obtain more accurate inter prediction contexts.
arXiv Detail & Related papers (2024-01-29T03:30:21Z) - Viewport Prediction for Volumetric Video Streaming by Exploring Video Saliency and Trajectory Information [45.31198546289057]
This paper presents and proposes a novel approach, named Saliency and Trajectory Viewport Prediction (STVP)
It aims to improve the precision of viewport prediction in volumetric video streaming.
In particular, we introduce a novel sampling method, Uniform Random Sampling (URS), to reduce computational complexity.
arXiv Detail & Related papers (2023-11-28T03:45:29Z) - Lightweight Delivery Detection on Doorbell Cameras [9.735137325682825]
In this work we investigate an important home application, video based delivery detection, and present a simple lightweight pipeline for this task.
Our method relies on motionconstrained to generate a set of coarse activity cues followed by their classification with a mobile-friendly 3DCNN network.
arXiv Detail & Related papers (2023-05-13T01:28:28Z) - STIP: A SpatioTemporal Information-Preserving and Perception-Augmented
Model for High-Resolution Video Prediction [78.129039340528]
We propose a Stemporal Information-Preserving and Perception-Augmented Model (STIP) to solve the above two problems.
The proposed model aims to preserve thetemporal information for videos during the feature extraction and the state transitions.
Experimental results show that the proposed STIP can predict videos with more satisfactory visual quality compared with a variety of state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T09:49:04Z) - STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond [78.129039340528]
We propose a temporal-aware unit (STAU) for video prediction and beyond.
Our STAU can outperform other methods on all tasks in terms of performance and efficiency.
arXiv Detail & Related papers (2022-04-20T13:42:51Z) - Borrowing from yourself: Faster future video segmentation with partial
channel update [0.0]
We propose to tackle the task of fast future video segmentation prediction through the use of convolutional layers with time-dependent channel masking.
This technique only updates a chosen subset of the feature maps at each time-step, bringing simultaneously less computation and latency.
We apply this technique to several fast architectures and experimentally confirm its benefits for the future prediction subtask.
arXiv Detail & Related papers (2022-02-11T16:37:53Z) - Adversarial Memory Networks for Action Prediction [95.09968654228372]
Action prediction aims to infer the forthcoming human action with partially-observed videos.
We propose adversarial memory networks (AMemNet) to generate the "full video" feature conditioning on a partial video query.
arXiv Detail & Related papers (2021-12-18T08:16:21Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Non-linear State-space Model Identification from Video Data using Deep
Encoders [0.0]
We propose a novel non-linear state-space identification method starting from high-dimensional input and output data.
An encoder function, represented by a neural network, is introduced to learn a reconstructability map to estimate the model states from past inputs and outputs.
We apply the proposed method to a video stream of a simulated environment of a controllable ball in a unit box.
arXiv Detail & Related papers (2020-12-14T17:14:46Z) - Intrinsic Temporal Regularization for High-resolution Human Video
Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain.
We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation.
We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.