Optimizing Video Prediction via Video Frame Interpolation
- URL: http://arxiv.org/abs/2206.13454v1
- Date: Mon, 27 Jun 2022 17:03:46 GMT
- Title: Optimizing Video Prediction via Video Frame Interpolation
- Authors: Yue Wu and Qiang Wen and Qifeng Chen
- Abstract summary: We present a new optimization framework for video prediction via video frame, inspired by photo-realistic results of video framescapes.
Our framework is based on optimization with a pretrained differentiable video frame module without the need for a training dataset.
Our approach outperforms other video prediction methods that require a large amount of training data or extra semantic information.
- Score: 53.16726447796844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video prediction is an extrapolation task that predicts future frames given
past frames, and video frame interpolation is an interpolation task that
estimates intermediate frames between two frames. We have witnessed the
tremendous advancement of video frame interpolation, but the general video
prediction in the wild is still an open question. Inspired by the
photo-realistic results of video frame interpolation, we present a new
optimization framework for video prediction via video frame interpolation, in
which we solve an extrapolation problem based on an interpolation model. Our
video prediction framework is based on optimization with a pretrained
differentiable video frame interpolation module without the need for a training
dataset, and thus there is no domain gap issue between training and test data.
Also, our approach does not need any additional information such as semantic or
instance maps, which makes our framework applicable to any video. Extensive
experiments on the Cityscapes, KITTI, DAVIS, Middlebury, and Vimeo90K datasets
show that our video prediction results are robust in general scenarios, and our
approach outperforms other video prediction methods that require a large amount
of training data or extra semantic information.
Related papers
- Frame-Voyager: Learning to Query Frames for Video Large Language Models [33.84793162102087]
Video Large Language Models (Video-LLMs) have made remarkable progress in video understanding tasks.
Existing frame selection approaches, such as uniform frame sampling and text-frame retrieval, fail to account for the information density variations in the videos.
We propose Frame-Voyager that learns to query informative frame combinations, based on the given textual queries in the task.
arXiv Detail & Related papers (2024-10-04T08:26:06Z) - ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner.
We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules.
Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z) - A unified model for continuous conditional video prediction [14.685237010856953]
Conditional video prediction tasks are normally solved by task-related models.
Almost all conditional video prediction models can only achieve discrete prediction.
In this paper, we propose a unified model that addresses these two issues at the same time.
arXiv Detail & Related papers (2022-10-11T22:26:59Z) - VMFormer: End-to-End Video Matting with Transformer [48.97730965527976]
Video matting aims to predict alpha mattes for each frame from a given input video sequence.
Recent solutions to video matting have been dominated by deep convolutional neural networks (CNN)
We propose VMFormer: a transformer-based end-to-end method for video matting.
arXiv Detail & Related papers (2022-08-26T17:51:02Z) - Cross-Attention Transformer for Video Interpolation [3.5317804902980527]
TAIN (Transformers and Attention for video INterpolation) aims to interpolate an intermediate frame given two consecutive image frames around it.
We first present a novel visual transformer module, named Cross-Similarity (CS), to globally aggregate input image features with similar appearance as those of the predicted frame.
To account for occlusions in the CS features, we propose an Image Attention (IA) module to allow the network to focus on CS features from one frame over those of the other.
arXiv Detail & Related papers (2022-07-08T21:38:54Z) - Revealing Single Frame Bias for Video-and-Language Learning [115.01000652123882]
We show that a single-frame trained model can achieve better performance than existing methods that use multiple frames for training.
This result reveals the existence of a strong "static appearance bias" in popular video-and-language datasets.
We propose two new retrieval tasks based on existing fine-grained action recognition datasets that encourage temporal modeling.
arXiv Detail & Related papers (2022-06-07T16:28:30Z) - Understanding Road Layout from Videos as a Whole [82.30800791500869]
We formulate it as a top-view road attributes prediction problem and our goal is to predict these attributes for each frame both accurately and consistently.
We exploit the following three novel aspects: leveraging camera motions in videos, including context cuesand incorporating long-term video information.
arXiv Detail & Related papers (2020-07-02T00:59:15Z) - Motion Segmentation using Frequency Domain Transformer Networks [29.998917158604694]
We propose a novel end-to-end learnable architecture that predicts the next frame by modeling foreground and background separately.
Our approach can outperform some widely used video prediction methods like Video Ladder Network and Predictive Gated Pyramids on synthetic data.
arXiv Detail & Related papers (2020-04-18T15:05:11Z) - Scene-Adaptive Video Frame Interpolation via Meta-Learning [54.87696619177496]
We propose to adapt the model to each video by making use of additional information that is readily available at test time.
We obtain significant performance gains with only a single gradient update without any additional parameters.
arXiv Detail & Related papers (2020-04-02T02:46:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.