Progressive Temporal Feature Alignment Network for Video Inpainting
- URL: http://arxiv.org/abs/2104.03507v1
- Date: Thu, 8 Apr 2021 04:50:33 GMT
- Title: Progressive Temporal Feature Alignment Network for Video Inpainting
- Authors: Xueyan Zou, Linjie Yang, Ding Liu, Yong Jae Lee
- Abstract summary: Video convolution aims to fill in-temporal "corrupted regions" with plausible content.
Current methods achieve this goal through attention, flow-based warping, or 3D temporal convolution.
We propose 'Progressive Temporal Feature Alignment Network', which progressively enriches features extracted from the current frame with the warped feature from neighbouring frames.
- Score: 51.26380898255555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video inpainting aims to fill spatio-temporal "corrupted" regions with
plausible content. To achieve this goal, it is necessary to find
correspondences from neighbouring frames to faithfully hallucinate the unknown
content. Current methods achieve this goal through attention, flow-based
warping, or 3D temporal convolution. However, flow-based warping can create
artifacts when optical flow is not accurate, while temporal convolution may
suffer from spatial misalignment. We propose 'Progressive Temporal Feature
Alignment Network', which progressively enriches features extracted from the
current frame with the feature warped from neighbouring frames using optical
flow. Our approach corrects the spatial misalignment in the temporal feature
propagation stage, greatly improving visual quality and temporal consistency of
the inpainted videos. Using the proposed architecture, we achieve
state-of-the-art performance on the DAVIS and FVI datasets compared to existing
deep learning approaches. Code is available at
https://github.com/MaureenZOU/TSAM.
Related papers
- STLight: a Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal joint Processing [6.872340834265972]
We propose STLight, a novel method for S-temporal learning that relies solely on channel-wise and depth-wise convolutions as learnable layers.
STLight overcomes the limitations of traditional convolutional approaches by rearranging spatial and temporal dimensions together.
Our architecture achieves state-of-the-art performance on STL benchmarks across datasets and settings, while significantly improving computational efficiency in terms of parameters and computational FLOPs.
arXiv Detail & Related papers (2024-11-15T13:53:19Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video
Translation [21.815083817914843]
We propose a new zero-shot video-to-video translation framework, named textitLatentWarp.
Our approach is simple: to constrain the query tokens to be temporally consistent, we further incorporate a warping operation in the latent space.
Experiment results demonstrate the superiority of textitLatentWarp in achieving video-to-video translation with temporal coherence.
arXiv Detail & Related papers (2023-11-01T08:02:57Z) - Motion-aware Memory Network for Fast Video Salient Object Detection [15.967509480432266]
We design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD.
In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames.
In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches.
The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
arXiv Detail & Related papers (2022-08-01T15:56:19Z) - Motion-blurred Video Interpolation and Extrapolation [72.3254384191509]
We present a novel framework for deblurring, interpolating and extrapolating sharp frames from a motion-blurred video in an end-to-end manner.
To ensure temporal coherence across predicted frames and address potential temporal ambiguity, we propose a simple, yet effective flow-based rule.
arXiv Detail & Related papers (2021-03-04T12:18:25Z) - FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised.
We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z) - Flow-edge Guided Video Completion [66.49077223104533]
Previous flow completion methods are often unable to retain the sharpness of motion boundaries.
Our method first extracts and completes motion edges, and then uses them to guide piecewise-smooth flow completion with sharp edges.
arXiv Detail & Related papers (2020-09-03T17:59:42Z) - Exploring Rich and Efficient Spatial Temporal Interactions for Real Time
Video Salient Object Detection [87.32774157186412]
Main stream methods formulate their video saliency mainly from two independent venues, i.e., the spatial and temporal branches.
In this paper, we propose atemporal network to achieve such improvement in a full interactive fashion.
Our method is easy to implement yet effective, achieving high quality video saliency detection in real-time speed with 50 FPS.
arXiv Detail & Related papers (2020-08-07T03:24:04Z) - Semantic Flow for Fast and Accurate Scene Parsing [28.444273169423074]
Flow Alignment Module (FAM) learns Semantic Flow between feature maps of adjacent levels.
Experiments are conducted on several challenging datasets, including Cityscapes, PASCAL Context, ADE20K and CamVid.
Our network is the first to achieve 80.4% mIoU on Cityscapes with a frame rate of 26 FPS.
arXiv Detail & Related papers (2020-02-24T08:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.