Unified Arbitrary-Time Video Frame Interpolation and Prediction
- URL: http://arxiv.org/abs/2503.02316v1
- Date: Tue, 04 Mar 2025 06:17:17 GMT
- Title: Unified Arbitrary-Time Video Frame Interpolation and Prediction
- Authors: Xin Jin, Longhai Wu, Jie Chen, Ilhyun Cho, Cheul-Hee Hahm,
- Abstract summary: Video frame and prediction aim to synthesize frames in-between and subsequent to existing frames, respectively.<n>While arbitrary-time has been extensively studied, the value of arbitrary-time prediction has been largely overlooked.<n>We present uniVIP - unified arbitrary-time Video Interpolation and Prediction.
- Score: 9.610711105923357
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video frame interpolation and prediction aim to synthesize frames in-between and subsequent to existing frames, respectively. Despite being closely-related, these two tasks are traditionally studied with different model architectures, or same architecture but individually trained weights. Furthermore, while arbitrary-time interpolation has been extensively studied, the value of arbitrary-time prediction has been largely overlooked. In this work, we present uniVIP - unified arbitrary-time Video Interpolation and Prediction. Technically, we firstly extend an interpolation-only network for arbitrary-time interpolation and prediction, with a special input channel for task (interpolation or prediction) encoding. Then, we show how to train a unified model on common triplet frames. Our uniVIP provides competitive results for video interpolation, and outperforms existing state-of-the-arts for video prediction. Codes will be available at: https://github.com/srcn-ivl/uniVIP
Related papers
- Real-time Video Prediction With Fast Video Interpolation Model and Prediction Training [9.225628670664596]
We propose real-time video prediction towards the zero-latency interaction over networks, called IFRVP.
We introduce ELAN-based residual blocks into the prediction models to improve both inference speed and accuracy.
Our evaluations show that our proposed models perform efficiently and achieve the best trade-off between prediction accuracy and computational speed.
arXiv Detail & Related papers (2025-03-29T18:48:46Z) - A Unified Framework for Event-based Frame Interpolation with Ad-hoc Deblurring in the Wild [72.0226493284814]
We propose a unified framework for event-based frame that performs deblurring ad-hoc.<n>Our network consistently outperforms previous state-of-the-art methods on frame, single image deblurring, and the joint task of both.
arXiv Detail & Related papers (2023-01-12T18:19:00Z) - A unified model for continuous conditional video prediction [14.685237010856953]
Conditional video prediction tasks are normally solved by task-related models.
Almost all conditional video prediction models can only achieve discrete prediction.
In this paper, we propose a unified model that addresses these two issues at the same time.
arXiv Detail & Related papers (2022-10-11T22:26:59Z) - VMFormer: End-to-End Video Matting with Transformer [48.97730965527976]
Video matting aims to predict alpha mattes for each frame from a given input video sequence.
Recent solutions to video matting have been dominated by deep convolutional neural networks (CNN)
We propose VMFormer: a transformer-based end-to-end method for video matting.
arXiv Detail & Related papers (2022-08-26T17:51:02Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - Optimizing Video Prediction via Video Frame Interpolation [53.16726447796844]
We present a new optimization framework for video prediction via video frame, inspired by photo-realistic results of video framescapes.
Our framework is based on optimization with a pretrained differentiable video frame module without the need for a training dataset.
Our approach outperforms other video prediction methods that require a large amount of training data or extra semantic information.
arXiv Detail & Related papers (2022-06-27T17:03:46Z) - Stand-Alone Inter-Frame Attention in Video Models [164.06137994796487]
We present a new recipe of inter-frame attention block, namely Stand-alone Inter-temporal Attention (SIFA)
SIFA remoulds the deformable design via re-scaling the offset predictions by the difference between two frames.
We further plug SIFA block into ConvNets and Vision Transformer, respectively, to devise SIFA-Net and SIFA-Transformer.
arXiv Detail & Related papers (2022-06-14T15:51:28Z) - Masked Conditional Video Diffusion for Prediction, Generation, and
Interpolation [14.631523634811392]
Masked Conditional Video Diffusion (MCVD) is a general-purpose framework for video prediction.
We train the model in a manner where we randomly and independently mask all the past frames or all the future frames.
Our approach yields SOTA results across standard video prediction benchmarks, with computation times measured in 1-12 days.
arXiv Detail & Related papers (2022-05-19T20:58:05Z) - Video Prediction at Multiple Scales with Hierarchical Recurrent Networks [24.536256844130996]
We propose a novel video prediction model able to forecast future possible outcomes of different levels of granularity simultaneously.
By combining spatial and temporal downsampling, MSPred is able to efficiently predict abstract representations over long time horizons.
In our experiments, we demonstrate that our proposed model accurately predicts future video frames as well as other representations on various scenarios.
arXiv Detail & Related papers (2022-03-17T13:08:28Z) - Video Frame Interpolation without Temporal Priors [91.04877640089053]
Video frame aims to synthesize non-exist intermediate frames in a video sequence.
The temporal priors of videos, i.e. frames per second (FPS) and frame exposure time, may vary from different camera sensors.
We devise a novel optical flow refinement strategy for better synthesizing results.
arXiv Detail & Related papers (2021-12-02T12:13:56Z) - Asymmetric Bilateral Motion Estimation for Video Frame Interpolation [50.44508853885882]
We propose a novel video frame algorithm based on asymmetric bilateral motion estimation (ABME)
We predict symmetric bilateral motion fields to interpolate an anchor frame.
We estimate asymmetric bilateral motions fields from the anchor frame to the input frames.
Third, we use the asymmetric fields to warp the input frames backward and reconstruct the intermediate frame.
arXiv Detail & Related papers (2021-08-15T21:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.