Lightweight Stochastic Video Prediction via Hybrid Warping
- URL: http://arxiv.org/abs/2412.03061v1
- Date: Wed, 04 Dec 2024 06:33:27 GMT
- Title: Lightweight Stochastic Video Prediction via Hybrid Warping
- Authors: Kazuki Kotoyori, Shota Hirose, Heming Sun, Jiro Katto,
- Abstract summary: Accurate video prediction by deep neural networks, especially for dynamic regions, is a challenging task in computer vision for critical applications such as autonomous driving, remote working, and telemedicine.
We propose a novel long-term complexity video prediction model that focuses on dynamic regions by employing a hybrid warping strategy.
Considering real-time predictions, we introduce a MobileNet-based lightweight architecture into our model.
- Score: 10.448675566568086
- License:
- Abstract: Accurate video prediction by deep neural networks, especially for dynamic regions, is a challenging task in computer vision for critical applications such as autonomous driving, remote working, and telemedicine. Due to inherent uncertainties, existing prediction models often struggle with the complexity of motion dynamics and occlusions. In this paper, we propose a novel stochastic long-term video prediction model that focuses on dynamic regions by employing a hybrid warping strategy. By integrating frames generated through forward and backward warpings, our approach effectively compensates for the weaknesses of each technique, improving the prediction accuracy and realism of moving regions in videos while also addressing uncertainty by making stochastic predictions that account for various motions. Furthermore, considering real-time predictions, we introduce a MobileNet-based lightweight architecture into our model. Our model, called SVPHW, achieves state-of-the-art performance on two benchmark datasets.
Related papers
- Characterized Diffusion Networks for Enhanced Autonomous Driving Trajectory Prediction [0.6202955567445396]
We present a novel trajectory prediction model for autonomous driving.
Our model enhances the accuracy and reliability of trajectory predictions by incorporating uncertainty estimation and complex agent interactions.
The proposed model showcases strong potential for application in real-world autonomous driving systems.
arXiv Detail & Related papers (2024-11-25T15:03:44Z) - Physics-guided Active Sample Reweighting for Urban Flow Prediction [75.24539704456791]
Urban flow prediction is a nuanced-temporal modeling that estimates the throughput of transportation services like buses, taxis and ride-driven models.
Some recent prediction solutions bring remedies with the notion of physics-guided machine learning (PGML)
We develop a atized physics-guided network (PN), and propose a data-aware framework Physics-guided Active Sample Reweighting (P-GASR)
arXiv Detail & Related papers (2024-07-18T15:44:23Z) - GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis [71.24791230358065]
We introduce a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis.
GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes.
Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.
arXiv Detail & Related papers (2024-05-30T06:47:55Z) - State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend [3.910356300831074]
We propose a state-space decomposition video prediction model that decomposes the overall video frame generation into deterministic appearance prediction and motion prediction.
We infer the long-term motion trend from conditional frames to guide the generation of future frames that exhibit high consistency with the conditional frames.
arXiv Detail & Related papers (2024-04-17T17:19:48Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Evaluation of Differentially Constrained Motion Models for Graph-Based
Trajectory Prediction [1.1947990549568765]
This research investigates the performance of various motion models in combination with numerical solvers for the prediction task.
The study shows that simpler models, such as low-order integrator models, are preferred over more complex, e.g., kinematic models, to achieve accurate predictions.
arXiv Detail & Related papers (2023-04-11T10:15:20Z) - LOPR: Latent Occupancy PRediction using Generative Models [49.15687400958916]
LiDAR generated occupancy grid maps (L-OGMs) offer a robust bird's eye-view scene representation.
We propose a framework that decouples occupancy prediction into: representation learning and prediction within the learned latent space.
arXiv Detail & Related papers (2022-10-03T22:04:00Z) - SFMGNet: A Physics-based Neural Network To Predict Pedestrian
Trajectories [2.862893981836593]
We present a physics-based neural network to predict pedestrian trajectories.
We quantitatively and qualitatively evaluate the model with respect to realistic prediction, prediction performance and prediction "interpretability"
Initial results suggest, the model even when solely trained on a synthetic dataset, can predict realistic and interpretable trajectories with better than state-of-the-art accuracy.
arXiv Detail & Related papers (2022-02-06T14:58:09Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - Wide and Narrow: Video Prediction from Context and Motion [54.21624227408727]
We propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks.
We present global context propagation networks that aggregate the non-local neighboring representations to preserve the contextual information over the past frames.
We also devise local filter memory networks that generate adaptive filter kernels by storing the motion of moving objects in the memory.
arXiv Detail & Related papers (2021-10-22T04:35:58Z) - A Gated Fusion Network for Dynamic Saliency Prediction [16.701214795454536]
Gated Fusion Network for dynamic saliency (GFSalNet)
GFSalNet is first deep saliency model capable of making predictions in a dynamic way via gated fusion mechanism.
We show that it has a good generalization ability, and moreover, exploits temporal information more effectively via its adaptive fusion scheme.
arXiv Detail & Related papers (2021-02-15T17:18:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.