Related papers: Progressive Temporal Feature Alignment Network for Video Inpainting

Progressive Temporal Feature Alignment Network for Video Inpainting

URL: http://arxiv.org/abs/2104.03507v1
Date: Thu, 8 Apr 2021 04:50:33 GMT
Title: Progressive Temporal Feature Alignment Network for Video Inpainting
Authors: Xueyan Zou, Linjie Yang, Ding Liu, Yong Jae Lee
Abstract summary: Video convolution aims to fill in-temporal "corrupted regions" with plausible content. Current methods achieve this goal through attention, flow-based warping, or 3D temporal convolution. We propose 'Progressive Temporal Feature Alignment Network', which progressively enriches features extracted from the current frame with the warped feature from neighbouring frames.
Score: 51.26380898255555
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video inpainting aims to fill spatio-temporal "corrupted" regions with plausible content. To achieve this goal, it is necessary to find correspondences from neighbouring frames to faithfully hallucinate the unknown content. Current methods achieve this goal through attention, flow-based warping, or 3D temporal convolution. However, flow-based warping can create artifacts when optical flow is not accurate, while temporal convolution may suffer from spatial misalignment. We propose 'Progressive Temporal Feature Alignment Network', which progressively enriches features extracted from the current frame with the feature warped from neighbouring frames using optical flow. Our approach corrects the spatial misalignment in the temporal feature propagation stage, greatly improving visual quality and temporal consistency of the inpainted videos. Using the proposed architecture, we achieve state-of-the-art performance on the DAVIS and FVI datasets compared to existing deep learning approaches. Code is available at https://github.com/MaureenZOU/TSAM.

Related papers

Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better [61.381599921020175]
Temporal consistency is critical in video prediction to ensure that outputs are coherent and free of artifacts. Traditional methods, such as temporal attention and 3D convolution, may struggle with significant object motion. We propose the Tracktention Layer, a novel architectural component that explicitly integrates motion information using point tracks.
arXiv Detail & Related papers (2025-03-25T17:58:48Z)
SplatVoxel: History-Aware Novel View Streaming without Temporal Training [29.759664150610362]
We study the problem of novel view streaming from sparse-view videos. Existing novel view synthesis methods struggle with temporal coherence and visual fidelity. We propose a hybrid splat-voxel feed-forward scene reconstruction approach.
arXiv Detail & Related papers (2025-03-18T20:00:47Z)
BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution [14.082598088990352]
We propose BF-STVSR, a C-STVSR framework with two key modules tailored to better represent spatial and temporal characteristics of video. Our approach achieves state-of-the-art in various metrics, including PSNR and SSIM, showing enhanced spatial details and natural temporal consistency.
arXiv Detail & Related papers (2025-01-19T13:29:41Z)
STLight: a Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal joint Processing [6.872340834265972]
We propose STLight, a novel method for S-temporal learning that relies solely on channel-wise and depth-wise convolutions as learnable layers. STLight overcomes the limitations of traditional convolutional approaches by rearranging spatial and temporal dimensions together. Our architecture achieves state-of-the-art performance on STL benchmarks across datasets and settings, while significantly improving computational efficiency in terms of parameters and computational FLOPs.
arXiv Detail & Related papers (2024-11-15T13:53:19Z)
Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames. It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z)
LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation [21.815083817914843]
We propose a new zero-shot video-to-video translation framework, named textitLatentWarp. Our approach is simple: to constrain the query tokens to be temporally consistent, we further incorporate a warping operation in the latent space. Experiment results demonstrate the superiority of textitLatentWarp in achieving video-to-video translation with temporal coherence.
arXiv Detail & Related papers (2023-11-01T08:02:57Z)
Motion-aware Memory Network for Fast Video Salient Object Detection [15.967509480432266]
We design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD. In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames. In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches. The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
arXiv Detail & Related papers (2022-08-01T15:56:19Z)
Motion-blurred Video Interpolation and Extrapolation [72.3254384191509]
We present a novel framework for deblurring, interpolating and extrapolating sharp frames from a motion-blurred video in an end-to-end manner. To ensure temporal coherence across predicted frames and address potential temporal ambiguity, we propose a simple, yet effective flow-based rule.
arXiv Detail & Related papers (2021-03-04T12:18:25Z)
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised. We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z)
Flow-edge Guided Video Completion [66.49077223104533]
Previous flow completion methods are often unable to retain the sharpness of motion boundaries. Our method first extracts and completes motion edges, and then uses them to guide piecewise-smooth flow completion with sharp edges.
arXiv Detail & Related papers (2020-09-03T17:59:42Z)
Exploring Rich and Efficient Spatial Temporal Interactions for Real Time Video Salient Object Detection [87.32774157186412]
Main stream methods formulate their video saliency mainly from two independent venues, i.e., the spatial and temporal branches. In this paper, we propose atemporal network to achieve such improvement in a full interactive fashion. Our method is easy to implement yet effective, achieving high quality video saliency detection in real-time speed with 50 FPS.
arXiv Detail & Related papers (2020-08-07T03:24:04Z)
Semantic Flow for Fast and Accurate Scene Parsing [28.444273169423074]
Flow Alignment Module (FAM) learns Semantic Flow between feature maps of adjacent levels. Experiments are conducted on several challenging datasets, including Cityscapes, PASCAL Context, ADE20K and CamVid. Our network is the first to achieve 80.4% mIoU on Cityscapes with a frame rate of 26 FPS.
arXiv Detail & Related papers (2020-02-24T08:53:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.