Video Frame Interpolation with Transformer
- URL: http://arxiv.org/abs/2205.07230v1
- Date: Sun, 15 May 2022 09:30:28 GMT
- Title: Video Frame Interpolation with Transformer
- Authors: Liying Lu, Ruizheng Wu, Huaijia Lin, Jiangbo Lu, Jiaya Jia
- Abstract summary: We introduce a novel framework, which takes advantage of Transformer to model long-range pixel correlation among video frames.
Our network is equipped with a novel cross-scale window-based attention mechanism, where cross-scale windows interact with each other.
- Score: 55.12620857638253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video frame interpolation (VFI), which aims to synthesize intermediate frames
of a video, has made remarkable progress with development of deep convolutional
networks over past years. Existing methods built upon convolutional networks
generally face challenges of handling large motion due to the locality of
convolution operations. To overcome this limitation, we introduce a novel
framework, which takes advantage of Transformer to model long-range pixel
correlation among video frames. Further, our network is equipped with a novel
cross-scale window-based attention mechanism, where cross-scale windows
interact with each other. This design effectively enlarges the receptive field
and aggregates multi-scale information. Extensive quantitative and qualitative
experiments demonstrate that our method achieves new state-of-the-art results
on various benchmarks.
Related papers
- Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields [39.214857326425204]
Video Frame Interpolation (VFI) aims to generate intermediate video frames between consecutive input frames.
We propose a novel event-based VFI framework with cross-modal asymmetric bidirectional motion field estimation.
Our method shows significant performance improvement over the state-of-the-art VFI methods on various datasets.
arXiv Detail & Related papers (2025-02-19T13:40:43Z) - Adapting Image-to-Video Diffusion Models for Large-Motion Frame Interpolation [0.0]
We present a conditional encoder designed to adapt an image-to-video model for a large-motion frame.
To enhance performance, we integrate a dual-branch feature extractor and propose a cross-frame attention mechanism.
Our approach demonstrates superior performance on the Fr'teche Video Distance metric when evaluated against other state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-22T14:49:55Z) - Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
Motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity.
We propose a novel diffusion framework, motion-aware latent diffusion models (MADiff)
Our method achieves state-of-the-art performance significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-04-21T05:09:56Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - Video Frame Interpolation with Flow Transformer [31.371987879960287]
Video frame has been actively studied with the development of convolutional neural networks.
We propose Video Frame Interpolation Flow Transformer to incorporate motion dynamics from optical flows into the self-attention mechanism.
Our framework is suitable for interpolating frames with large motion while maintaining reasonably low complexity.
arXiv Detail & Related papers (2023-07-30T06:44:37Z) - Efficient Convolution and Transformer-Based Network for Video Frame
Interpolation [11.036815066639473]
A novel method integrating a transformer encoder and convolutional features is proposed.
This network reduces the memory burden by close to 50% and runs up to four times faster during inference time.
A dual-encoder architecture is introduced which combines the strength of convolutions in modelling local correlations with those of the transformer for long-range dependencies.
arXiv Detail & Related papers (2023-07-12T20:14:06Z) - Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations.
To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video.
In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z) - Wide and Narrow: Video Prediction from Context and Motion [54.21624227408727]
We propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks.
We present global context propagation networks that aggregate the non-local neighboring representations to preserve the contextual information over the past frames.
We also devise local filter memory networks that generate adaptive filter kernels by storing the motion of moving objects in the memory.
arXiv Detail & Related papers (2021-10-22T04:35:58Z) - Hierarchical Multimodal Transformer to Summarize Videos [103.47766795086206]
Motivated by the great success of transformer and the natural structure of video (frame-shot-video), a hierarchical transformer is developed for video summarization.
To integrate the two kinds of information, they are encoded in a two-stream scheme, and a multimodal fusion mechanism is developed based on the hierarchical transformer.
Practically, extensive experiments show that HMT surpasses most of the traditional, RNN-based and attention-based video summarization methods.
arXiv Detail & Related papers (2021-09-22T07:38:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.