Adaptive Future Frame Prediction with Ensemble Network
- URL: http://arxiv.org/abs/2011.06788v2
- Date: Mon, 16 Nov 2020 01:43:53 GMT
- Title: Adaptive Future Frame Prediction with Ensemble Network
- Authors: Wonjik Kim, Masayuki Tanaka, Masatoshi Okutomi, Yoko Sasaki
- Abstract summary: We propose an adaptive update framework for the future frame prediction task.
The proposed framework consists of a pre-trained prediction network, a continuous-updating prediction network, and a weight estimation network.
Our approach outperforms existing methods especially for dynamically changing scenes.
- Score: 15.19884183320726
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Future frame prediction in videos is a challenging problem because videos
include complicated movements and large appearance changes. Learning-based
future frame prediction approaches have been proposed in kinds of literature. A
common limitation of the existing learning-based approaches is a mismatch of
training data and test data. In the future frame prediction task, we can obtain
the ground truth data by just waiting for a few frames. It means we can update
the prediction model online in the test phase. Then, we propose an adaptive
update framework for the future frame prediction task. The proposed adaptive
updating framework consists of a pre-trained prediction network, a
continuous-updating prediction network, and a weight estimation network. We
also show that our pre-trained prediction model achieves comparable performance
to the existing state-of-the-art approaches. We demonstrate that our approach
outperforms existing methods especially for dynamically changing scenes.
Related papers
- Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Conformal online model aggregation [29.43493007296859]
This paper proposes a new approach towards conformal model aggregation in online settings.
It is based on combining the prediction sets from several algorithms by voting, where weights on the models are adapted over time based on past performance.
arXiv Detail & Related papers (2024-03-22T15:40:06Z) - Optimizing Video Prediction via Video Frame Interpolation [53.16726447796844]
We present a new optimization framework for video prediction via video frame, inspired by photo-realistic results of video framescapes.
Our framework is based on optimization with a pretrained differentiable video frame module without the need for a training dataset.
Our approach outperforms other video prediction methods that require a large amount of training data or extra semantic information.
arXiv Detail & Related papers (2022-06-27T17:03:46Z) - Learning Future Object Prediction with a Spatiotemporal Detection
Transformer [1.1543275835002982]
We train a detection transformer to directly output future objects.
We extend existing transformers in two ways to capture scene dynamics.
Our final approach learns to capture the dynamics and make predictions on par with an oracle for 100 ms prediction horizons.
arXiv Detail & Related papers (2022-04-21T17:58:36Z) - Reinforcement Learning with Action-Free Pre-Training from Videos [95.25074614579646]
We introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos.
Our framework significantly improves both final performances and sample-efficiency of vision-based reinforcement learning.
arXiv Detail & Related papers (2022-03-25T19:44:09Z) - FitVid: Overfitting in Pixel-Level Video Prediction [117.59339756506142]
We introduce a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks.
FitVid outperforms the current state-of-the-art models across four different video prediction benchmarks on four different metrics.
arXiv Detail & Related papers (2021-06-24T17:20:21Z) - DFPN: Deformable Frame Prediction Network [10.885590093103344]
We propose a deformable frame prediction network (DFPN) for task oriented implicit motion modeling and next frame prediction.
Experimental results demonstrate that the proposed DFPN model achieves state of the art results in next frame prediction.
arXiv Detail & Related papers (2021-05-26T19:00:19Z) - Revisiting Hierarchical Approach for Persistent Long-Term Video
Prediction [55.4498466252522]
We set a new standard of video prediction with orders of magnitude longer prediction time than existing approaches.
Our method predicts future frames by first estimating a sequence of semantic structures and subsequently translating the structures to pixels by video-to-video translation.
We evaluate our method on three challenging datasets involving car driving and human dancing, and demonstrate that it can generate complicated scene structures and motions over a very long time horizon.
arXiv Detail & Related papers (2021-04-14T08:39:38Z) - Video Prediction via Example Guidance [156.08546987158616]
In video prediction tasks, one major challenge is to capture the multi-modal nature of future contents and dynamics.
In this work, we propose a simple yet effective framework that can efficiently predict plausible future states.
arXiv Detail & Related papers (2020-07-03T14:57:24Z) - Improved Speech Representations with Multi-Target Autoregressive
Predictive Coding [23.424410568555547]
We extend the hypothesis that hidden states that can accurately predict future frames are a useful representation for many downstream tasks.
We propose an auxiliary objective that serves as a regularization to improve generalization of the future frame prediction task.
arXiv Detail & Related papers (2020-04-11T01:09:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.