A Log-likelihood Regularized KL Divergence for Video Prediction with A
3D Convolutional Variational Recurrent Network
- URL: http://arxiv.org/abs/2012.06123v1
- Date: Fri, 11 Dec 2020 05:05:31 GMT
- Title: A Log-likelihood Regularized KL Divergence for Video Prediction with A
3D Convolutional Variational Recurrent Network
- Authors: Haziq Razali and Basura Fernando
- Abstract summary: We introduce a new variational model that extends the recurrent network in two ways for the task of frame prediction.
First, we introduce 3D convolutions inside all modules including the recurrent model for future prediction frame, inputting sequence and outputting video frames at each timestep.
Second, we enhance the latent loss predictions of the variational model by introducing a maximum likelihood estimate in addition to the KL that is commonly used in variational models.
- Score: 17.91970304953206
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The use of latent variable models has shown to be a powerful tool for
modeling probability distributions over sequences. In this paper, we introduce
a new variational model that extends the recurrent network in two ways for the
task of video frame prediction. First, we introduce 3D convolutions inside all
modules including the recurrent model for future frame prediction, inputting
and outputting a sequence of video frames at each timestep. This enables us to
better exploit spatiotemporal information inside the variational recurrent
model, allowing us to generate high-quality predictions. Second, we enhance the
latent loss of the variational model by introducing a maximum likelihood
estimate in addition to the KL divergence that is commonly used in variational
models. This simple extension acts as a stronger regularizer in the variational
autoencoder loss function and lets us obtain better results and
generalizability. Experiments show that our model outperforms existing video
prediction methods on several benchmarks while requiring fewer parameters.
Related papers
- Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Video Prediction by Efficient Transformers [14.685237010856953]
We present a new family of Transformer-based models for video prediction.
Experiments show that the proposed video prediction models are competitive with more complex state-of-the-art convolutional-LSTM based models.
arXiv Detail & Related papers (2022-12-12T16:46:48Z) - HARP: Autoregressive Latent Video Prediction with High-Fidelity Image
Generator [90.74663948713615]
We train an autoregressive latent video prediction model capable of predicting high-fidelity future frames.
We produce high-resolution (256x256) videos with minimal modification to existing models.
arXiv Detail & Related papers (2022-09-15T08:41:57Z) - Hierarchical Latent Structure for Multi-Modal Vehicle Trajectory
Forecasting [0.0]
We introduce a hierarchical latent structure into a VAE-based trajectory forecasting model.
Our model is capable of generating clear multi-modal trajectory distributions and outperforms the state-of-the-art (SOTA) models in terms of prediction accuracy.
arXiv Detail & Related papers (2022-07-11T04:52:28Z) - Multi-Contextual Predictions with Vision Transformer for Video Anomaly
Detection [22.098399083491937]
understanding of thetemporal context of a video plays a vital role in anomaly detection.
We design a transformer model with three different contextual prediction streams: masked, whole and partial.
By learning to predict the missing frames of consecutive normal frames, our model can effectively learn various normality patterns in the video.
arXiv Detail & Related papers (2022-06-17T05:54:31Z) - Adaptive Graph Convolutional Network Framework for Multidimensional Time
Series Prediction [6.962213869946514]
We introduce an adaptive graph neural network to capture hidden dimension dependencies in mostly time series prediction.
We integrate graph convolutional networks into varioustemporal series prediction models to solve the defect that they cannot capture the relationship between different dimensions.
The accuracy of our framework improved by about 10% after being introduced into the model.
arXiv Detail & Related papers (2022-05-08T04:50:16Z) - Distribution-Aware Single-Stage Models for Multi-Person 3D Pose
Estimation [29.430404703883084]
We present a novel Distribution-Aware Single-stage (DAS) model for tackling the challenging multi-person 3D pose estimation problem.
The proposed DAS model simultaneously localizes person positions and their corresponding body joints in the 3D camera space in a one-pass manner.
Comprehensive experiments on benchmarks CMU Panoptic and MuPoTS-3D demonstrate the superior efficiency of the proposed DAS model.
arXiv Detail & Related papers (2022-03-15T07:30:27Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - Greedy Hierarchical Variational Autoencoders for Large-Scale Video
Prediction [79.23730812282093]
We introduce Greedy Hierarchical Variational Autoencoders (GHVAEs), a method that learns high-fidelity video predictions by greedily training each level of a hierarchical autoencoder.
GHVAEs provide 17-55% gains in prediction performance on four video datasets, a 35-40% higher success rate on real robot tasks, and can improve performance monotonically by simply adding more modules.
arXiv Detail & Related papers (2021-03-06T18:58:56Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z) - Consistency Guided Scene Flow Estimation [159.24395181068218]
CGSF is a self-supervised framework for the joint reconstruction of 3D scene structure and motion from stereo video.
We show that the proposed model can reliably predict disparity and scene flow in challenging imagery.
It achieves better generalization than the state-of-the-art, and adapts quickly and robustly to unseen domains.
arXiv Detail & Related papers (2020-06-19T17:28:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.