Scene-Adaptive Video Frame Interpolation via Meta-Learning
- URL: http://arxiv.org/abs/2004.00779v1
- Date: Thu, 2 Apr 2020 02:46:44 GMT
- Title: Scene-Adaptive Video Frame Interpolation via Meta-Learning
- Authors: Myungsub Choi, Janghoon Choi, Sungyong Baik, Tae Hyun Kim, Kyoung Mu
Lee
- Abstract summary: We propose to adapt the model to each video by making use of additional information that is readily available at test time.
We obtain significant performance gains with only a single gradient update without any additional parameters.
- Score: 54.87696619177496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video frame interpolation is a challenging problem because there are
different scenarios for each video depending on the variety of foreground and
background motion, frame rate, and occlusion. It is therefore difficult for a
single network with fixed parameters to generalize across different videos.
Ideally, one could have a different network for each scenario, but this is
computationally infeasible for practical applications. In this work, we propose
to adapt the model to each video by making use of additional information that
is readily available at test time and yet has not been exploited in previous
works. We first show the benefits of `test-time adaptation' through simple
fine-tuning of a network, then we greatly improve its efficiency by
incorporating meta-learning. We obtain significant performance gains with only
a single gradient update without any additional parameters. Finally, we show
that our meta-learning framework can be easily employed to any video frame
interpolation network and can consistently improve its performance on multiple
benchmark datasets.
Related papers
- Realizing Video Summarization from the Path of Language-based Semantic Understanding [19.825666473712197]
We propose a novel video summarization framework inspired by the Mixture of Experts (MoE) paradigm.
Our approach integrates multiple VideoLLMs to generate comprehensive and coherent textual summaries.
arXiv Detail & Related papers (2024-10-06T15:03:22Z) - Learning from One Continuous Video Stream [70.30084026960819]
We introduce a framework for online learning from a single continuous video stream.
This poses great challenges given the high correlation between consecutive video frames.
We employ pixel-to-pixel modelling as a practical and flexible way to switch between pre-training and single-stream evaluation.
arXiv Detail & Related papers (2023-12-01T14:03:30Z) - Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain
Adaptation [74.51546366251753]
Video topic segmentation unveils the coarse-grained semantic structure underlying videos.
We introduce a multi-modal video topic segmenter that utilizes both video transcripts and frames.
Our proposed solution significantly surpasses baseline methods in terms of both accuracy and transferability.
arXiv Detail & Related papers (2023-11-30T21:59:05Z) - Optimizing Video Prediction via Video Frame Interpolation [53.16726447796844]
We present a new optimization framework for video prediction via video frame, inspired by photo-realistic results of video framescapes.
Our framework is based on optimization with a pretrained differentiable video frame module without the need for a training dataset.
Our approach outperforms other video prediction methods that require a large amount of training data or extra semantic information.
arXiv Detail & Related papers (2022-06-27T17:03:46Z) - Revealing Single Frame Bias for Video-and-Language Learning [115.01000652123882]
We show that a single-frame trained model can achieve better performance than existing methods that use multiple frames for training.
This result reveals the existence of a strong "static appearance bias" in popular video-and-language datasets.
We propose two new retrieval tasks based on existing fine-grained action recognition datasets that encourage temporal modeling.
arXiv Detail & Related papers (2022-06-07T16:28:30Z) - Adaptive Focus for Efficient Video Recognition [29.615394426035074]
We propose a reinforcement learning based approach for efficient spatially adaptive video recognition (AdaFocus)
A light-weighted ConvNet is first adopted to quickly process the full video sequence, whose features are used by a recurrent policy network to localize the most task-relevant regions.
During offline inference, once the informative patch sequence has been generated, the bulk of computation can be done in parallel, and is efficient on modern GPU devices.
arXiv Detail & Related papers (2021-05-07T13:24:47Z) - Less is More: ClipBERT for Video-and-Language Learning via Sparse
Sampling [98.41300980759577]
A canonical approach to video-and-language learning dictates a neural model to learn from offline-extracted dense video features.
We propose a generic framework ClipBERT that enables affordable end-to-end learning for video-and-language tasks.
Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms existing methods.
arXiv Detail & Related papers (2021-02-11T18:50:16Z) - Learning Video Instance Segmentation with Recurrent Graph Neural
Networks [39.06202374530647]
We propose a novel learning formulation, where the entire video instance segmentation problem is modelled jointly.
We fit a flexible model to our formulation that, with the help of a graph neural network, processes all available new information in each frame.
Our approach, operating at over 25 FPS, outperforms previous video real-time methods.
arXiv Detail & Related papers (2020-12-07T18:41:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.