Related papers: Video Frame Interpolation with Flow Transformer

Video Frame Interpolation with Flow Transformer

URL: http://arxiv.org/abs/2307.16144v1
Date: Sun, 30 Jul 2023 06:44:37 GMT
Title: Video Frame Interpolation with Flow Transformer
Authors: Pan Gao, Haoyue Tian, Jie Qin
Abstract summary: Video frame has been actively studied with the development of convolutional neural networks. We propose Video Frame Interpolation Flow Transformer to incorporate motion dynamics from optical flows into the self-attention mechanism. Our framework is suitable for interpolating frames with large motion while maintaining reasonably low complexity.
Score: 31.371987879960287
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video frame interpolation has been actively studied with the development of convolutional neural networks. However, due to the intrinsic limitations of kernel weight sharing in convolution, the interpolated frame generated by it may lose details. In contrast, the attention mechanism in Transformer can better distinguish the contribution of each pixel, and it can also capture long-range pixel dependencies, which provides great potential for video interpolation. Nevertheless, the original Transformer is commonly used for 2D images; how to develop a Transformer-based framework with consideration of temporal self-attention for video frame interpolation remains an open issue. In this paper, we propose Video Frame Interpolation Flow Transformer to incorporate motion dynamics from optical flows into the self-attention mechanism. Specifically, we design a Flow Transformer Block that calculates the temporal self-attention in a matched local area with the guidance of flow, making our framework suitable for interpolating frames with large motion while maintaining reasonably low complexity. In addition, we construct a multi-scale architecture to account for multi-scale motion, further improving the overall performance. Extensive experiments on three benchmarks demonstrate that the proposed method can generate interpolated frames with better visual quality than state-of-the-art methods.

Related papers

Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames. It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z)
Aggregating Nearest Sharp Features via Hybrid Transformers for Video Deblurring [70.06559269075352]
We propose a video deblurring method that leverages both neighboring frames and existing sharp frames using hybrid Transformers for feature aggregation. To aggregate nearest sharp features from detected sharp frames, we utilize a global Transformer with multi-scale matching capability. Our proposed method outperforms state-of-the-art video deblurring methods as well as event-driven video deblurring methods in terms of quantitative metrics and visual quality.
arXiv Detail & Related papers (2023-09-13T16:12:11Z)
Meta-Interpolation: Time-Arbitrary Frame Interpolation via Dual Meta-Learning [65.85319901760478]
We consider processing different time-steps with adaptively generated convolutional kernels in a unified way with the help of meta-learning. We develop a dual meta-learned frame framework to synthesize intermediate frames with the guidance of context information and optical flow.
arXiv Detail & Related papers (2022-07-27T17:36:23Z)
TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames. We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI) Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z)
Video Frame Interpolation with Transformer [55.12620857638253]
We introduce a novel framework, which takes advantage of Transformer to model long-range pixel correlation among video frames. Our network is equipped with a novel cross-scale window-based attention mechanism, where cross-scale windows interact with each other.
arXiv Detail & Related papers (2022-05-15T09:30:28Z)
Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations. To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video. In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z)
EA-Net: Edge-Aware Network for Flow-based Video Frame Interpolation [101.75999290175412]
We propose to reduce the image blur and get the clear shape of objects by preserving the edges in the interpolated frames. The proposed Edge-Aware Network (EANet) integrates the edge information into the frame task. Three edge-aware mechanisms are developed to emphasize the frame edges in estimating flow maps.
arXiv Detail & Related papers (2021-05-17T08:44:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.