Future Frame Prediction of a Video Sequence
- URL: http://arxiv.org/abs/2009.01689v1
- Date: Mon, 31 Aug 2020 15:31:02 GMT
- Title: Future Frame Prediction of a Video Sequence
- Authors: Jasmeen Kaur, Sukhendu Das
- Abstract summary: The ability to predict, anticipate and reason about future events is the essence of intelligence.
The ability to predict, anticipate and reason about future events is the essence of intelligence.
- Score: 5.660207256468971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting future frames of a video sequence has been a problem of high
interest in the field of Computer Vision as it caters to a multitude of
applications. The ability to predict, anticipate and reason about future events
is the essence of intelligence and one of the main goals of decision-making
systems such as human-machine interaction, robot navigation and autonomous
driving. However, the challenge lies in the ambiguous nature of the problem as
there may be multiple future sequences possible for the same input video shot.
A naively designed model averages multiple possible futures into a single
blurry prediction.
Recently, two distinct approaches have attempted to address this problem as:
(a) use of latent variable models that represent underlying stochasticity and
(b) adversarially trained models that aim to produce sharper images. A latent
variable model often struggles to produce realistic results, while an
adversarially trained model underutilizes latent variables and thus fails to
produce diverse predictions. These methods have revealed complementary
strengths and weaknesses. Combining the two approaches produces predictions
that appear more realistic and better cover the range of plausible futures.
This forms the basis and objective of study in this project work.
In this paper, we proposed a novel multi-scale architecture combining both
approaches. We validate our proposed model through a series of experiments and
empirical evaluations on Moving MNIST, UCF101, and Penn Action datasets. Our
method outperforms the results obtained using the baseline methods.
Related papers
- Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Existence Is Chaos: Enhancing 3D Human Motion Prediction with Uncertainty Consideration [27.28184416632815]
We argue that the recorded motion in training data could be an observation of possible future, rather than a predetermined result.
A novel computationally efficient encoder-decoder model with uncertainty consideration is proposed.
arXiv Detail & Related papers (2024-03-21T03:34:18Z) - ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life
Videos [53.92440577914417]
ACQUIRED consists of 3.9K annotated videos, encompassing a wide range of event types and incorporating both first and third-person viewpoints.
Each video is annotated with questions that span three distinct dimensions of reasoning, including physical, social, and temporal.
We benchmark our dataset against several state-of-the-art language-only and multimodal models and experimental results demonstrate a significant performance gap.
arXiv Detail & Related papers (2023-11-02T22:17:03Z) - Wildfire Forecasting with Satellite Images and Deep Generative Model [0.0]
We use a series of wildfire images as a video to anticipate how the fire would behave in the future.
We introduce a novel temporal model whose dynamics are driven in a latent space.
Results will be compared towards various benchmarking models.
arXiv Detail & Related papers (2022-08-19T15:52:43Z) - Video Prediction at Multiple Scales with Hierarchical Recurrent Networks [24.536256844130996]
We propose a novel video prediction model able to forecast future possible outcomes of different levels of granularity simultaneously.
By combining spatial and temporal downsampling, MSPred is able to efficiently predict abstract representations over long time horizons.
In our experiments, we demonstrate that our proposed model accurately predicts future video frames as well as other representations on various scenarios.
arXiv Detail & Related papers (2022-03-17T13:08:28Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - FitVid: Overfitting in Pixel-Level Video Prediction [117.59339756506142]
We introduce a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks.
FitVid outperforms the current state-of-the-art models across four different video prediction benchmarks on four different metrics.
arXiv Detail & Related papers (2021-06-24T17:20:21Z) - Future Frame Prediction for Robot-assisted Surgery [57.18185972461453]
We propose a ternary prior guided variational autoencoder (TPG-VAE) model for future frame prediction in robotic surgical video sequences.
Besides content distribution, our model learns motion distribution, which is novel to handle the small movements of surgical tools.
arXiv Detail & Related papers (2021-03-18T15:12:06Z) - Adversarial Generative Grammars for Human Activity Prediction [141.43526239537502]
We propose an adversarial generative grammar model for future prediction.
Our grammar is designed so that it can learn production rules from the data distribution.
Being able to select multiple production rules during inference leads to different predicted outcomes.
arXiv Detail & Related papers (2020-08-11T17:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.