A Survey on Video Prediction: From Deterministic to Generative Approaches
- URL: http://arxiv.org/abs/2401.14718v3
- Date: Mon, 22 Jul 2024 10:18:26 GMT
- Title: A Survey on Video Prediction: From Deterministic to Generative Approaches
- Authors: Ruibo Ming, Zhewei Huang, Zhuoxuan Ju, Jianming Hu, Lihui Peng, Shuchang Zhou,
- Abstract summary: Video prediction, a fundamental task in computer vision, aims to enable models to generate sequences of future frames based on existing video content.
We comprehensively survey both historical and contemporary works in this field, encompassing the most widely used datasets and algorithms.
We propose a novel taxonomy centered on the nature of video prediction algorithms.
- Score: 8.131773189457077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video prediction, a fundamental task in computer vision, aims to enable models to generate sequences of future frames based on existing video content. This task has garnered widespread application across various domains. In this paper, we comprehensively survey both historical and contemporary works in this field, encompassing the most widely used datasets and algorithms. Our survey scrutinizes the challenges and evolving landscape of video prediction within the realm of computer vision. We propose a novel taxonomy centered on the stochastic nature of video prediction algorithms. This taxonomy accentuates the gradual transition from deterministic to generative prediction methodologies, underlining significant advancements and shifts in approach.
Related papers
- Visual Representation Learning with Stochastic Frame Prediction [90.99577838303297]
This paper revisits the idea of video generation that learns to capture uncertainty in frame prediction.
We design a framework that trains a frame prediction model to learn temporal information between frames.
We find this architecture allows for combining both objectives in a synergistic and compute-efficient manner.
arXiv Detail & Related papers (2024-06-11T16:05:15Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Video Prediction at Multiple Scales with Hierarchical Recurrent Networks [24.536256844130996]
We propose a novel video prediction model able to forecast future possible outcomes of different levels of granularity simultaneously.
By combining spatial and temporal downsampling, MSPred is able to efficiently predict abstract representations over long time horizons.
In our experiments, we demonstrate that our proposed model accurately predicts future video frames as well as other representations on various scenarios.
arXiv Detail & Related papers (2022-03-17T13:08:28Z) - Wide and Narrow: Video Prediction from Context and Motion [54.21624227408727]
We propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks.
We present global context propagation networks that aggregate the non-local neighboring representations to preserve the contextual information over the past frames.
We also devise local filter memory networks that generate adaptive filter kernels by storing the motion of moving objects in the memory.
arXiv Detail & Related papers (2021-10-22T04:35:58Z) - Review of Video Predictive Understanding: Early ActionRecognition and
Future Action Prediction [39.966828592322315]
Action prediction is a major sub-area of video predictive understanding.
Various mathematical tools are widely adopted jointly with computer vision techniques for these two tasks.
Structures that rely on deep convolutional neural networks and recurrent neural networks have been extensively proposed for improving the performance of existing vision tasks.
arXiv Detail & Related papers (2021-07-11T22:46:52Z) - Efficient training for future video generation based on hierarchical
disentangled representation of latent variables [66.94698064734372]
We propose a novel method for generating future prediction videos with less memory usage than the conventional methods.
We achieve high-efficiency by training our method in two stages: (1) image reconstruction to encode video frames into latent variables, and (2) latent variable prediction to generate the future sequence.
Our experiments show that the proposed method can efficiently generate future prediction videos, even for complex datasets that cannot be handled by previous methods.
arXiv Detail & Related papers (2021-06-07T10:43:23Z) - Revisiting Hierarchical Approach for Persistent Long-Term Video
Prediction [55.4498466252522]
We set a new standard of video prediction with orders of magnitude longer prediction time than existing approaches.
Our method predicts future frames by first estimating a sequence of semantic structures and subsequently translating the structures to pixels by video-to-video translation.
We evaluate our method on three challenging datasets involving car driving and human dancing, and demonstrate that it can generate complicated scene structures and motions over a very long time horizon.
arXiv Detail & Related papers (2021-04-14T08:39:38Z) - Video Summarization Using Deep Neural Networks: A Survey [72.98424352264904]
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content.
This work focuses on the recent advances in the area and provides a comprehensive survey of the existing deep-learning-based methods for generic video summarization.
arXiv Detail & Related papers (2021-01-15T11:41:29Z) - Future Frame Prediction of a Video Sequence [5.660207256468971]
The ability to predict, anticipate and reason about future events is the essence of intelligence.
The ability to predict, anticipate and reason about future events is the essence of intelligence.
arXiv Detail & Related papers (2020-08-31T15:31:02Z) - Deep Learning for Vision-based Prediction: A Survey [6.840474688871695]
Vision-based prediction algorithms have a wide range of applications including autonomous driving, surveillance, human-robot interaction, weather prediction.
This paper provides an overview of the field in the past five years with a particular focus on deep learning approaches.
arXiv Detail & Related papers (2020-06-30T20:26:46Z) - A Review on Deep Learning Techniques for Video Prediction [3.203688549673373]
The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems.
Deep learning-based video prediction emerged as a promising research direction.
arXiv Detail & Related papers (2020-04-10T19:58:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.