ProPainter: Improving Propagation and Transformer for Video Inpainting
- URL: http://arxiv.org/abs/2309.03897v1
- Date: Thu, 7 Sep 2023 17:57:29 GMT
- Title: ProPainter: Improving Propagation and Transformer for Video Inpainting
- Authors: Shangchen Zhou, Chongyi Li, Kelvin C.K. Chan, Chen Change Loy
- Abstract summary: Flow-based propagation and computational Transformer are two mainstream mechanisms in video intemporal (VI)
We introduce dual-domain propagation that combines the advantages of image and feature warping, exploiting global correspondences reliably.
We also propose a mask-guided sparse video Transformer, which achieves high efficiency by discarding redundant tokens.
- Score: 98.70898369695517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Flow-based propagation and spatiotemporal Transformer are two mainstream
mechanisms in video inpainting (VI). Despite the effectiveness of these
components, they still suffer from some limitations that affect their
performance. Previous propagation-based approaches are performed separately
either in the image or feature domain. Global image propagation isolated from
learning may cause spatial misalignment due to inaccurate optical flow.
Moreover, memory or computational constraints limit the temporal range of
feature propagation and video Transformer, preventing exploration of
correspondence information from distant frames. To address these issues, we
propose an improved framework, called ProPainter, which involves enhanced
ProPagation and an efficient Transformer. Specifically, we introduce
dual-domain propagation that combines the advantages of image and feature
warping, exploiting global correspondences reliably. We also propose a
mask-guided sparse video Transformer, which achieves high efficiency by
discarding unnecessary and redundant tokens. With these components, ProPainter
outperforms prior arts by a large margin of 1.46 dB in PSNR while maintaining
appealing efficiency.
Related papers
- FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors [64.54220123913154]
We introduce FramePainter as an efficient instantiation of image-to-video generation problem.
It only uses a lightweight sparse control encoder to inject editing signals.
It domainantly outperforms previous state-of-the-art methods with far less training data.
arXiv Detail & Related papers (2025-01-14T16:09:16Z) - Hierarchical Separable Video Transformer for Snapshot Compressive Imaging [46.23615648331571]
Hierarchical Separable Video Transformer (HiSViT) is a reconstruction architecture without temporal aggregation.
HiSViT is built by multiple groups of Cross-Scale Separable Multi-head Self-Attention (CSS-MSA) and Gated Self-Modulated Feed-Forward Network ( GSM-FFN)
Our method outperforms previous methods by $!>!0.5$ with comparable or fewer parameters and complexity.
arXiv Detail & Related papers (2024-07-16T17:35:59Z) - Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring [14.839956958725883]
We propose blurbfBSSTNet, textbfBlur-aware textbfStext-temporal textbfTransformer Network.
The proposed BSSTNet outperforms the state-of-the-art methods on the GoPro and DVD datasets.
arXiv Detail & Related papers (2024-06-11T17:59:56Z) - Decoupling Degradation and Content Processing for Adverse Weather Image
Restoration [79.59228846484415]
Adverse weather image restoration strives to recover clear images from those affected by various weather types, such as rain, haze, and snow.
Previous techniques can handle multiple weather types within a single network, but they neglect the crucial distinction between these two processes, limiting the quality of restored images.
This work introduces a novel adverse weather image restoration method, called DDCNet, which decouples the degradation removal and content reconstruction process at the feature level based on their channel statistics.
arXiv Detail & Related papers (2023-12-08T12:26:38Z) - CNN Injected Transformer for Image Exposure Correction [20.282217209520006]
Previous exposure correction methods based on convolutions often produce exposure deviation in images.
We propose a CNN Injected Transformer (CIT) to harness the individual strengths of CNN and Transformer simultaneously.
In addition to the hybrid architecture design for exposure correction, we apply a set of carefully formulated loss functions to improve the spatial coherence and rectify potential color deviations.
arXiv Detail & Related papers (2023-09-08T14:53:00Z) - Burstormer: Burst Image Restoration and Enhancement Transformer [117.56199661345993]
On a shutter press, modern handheld cameras capture multiple images in rapid succession and merge them to generate a single image.
The challenge is to properly align the successive image shots and merge their complimentary information to achieve high-quality outputs.
We propose Burstormer: a novel transformer-based architecture for burst image restoration and enhancement.
arXiv Detail & Related papers (2023-04-03T17:58:44Z) - Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting [11.837764007052813]
We propose flow-guided transformer (FGT) to pursue more effective and efficient video inpainting.
FGT++ is experimentally evaluated to be outperforming the existing video inpainting networks.
arXiv Detail & Related papers (2023-01-24T14:44:44Z) - Efficient Attention-free Video Shift Transformers [56.87581500474093]
This paper tackles the problem of efficient video recognition.
Video transformers have recently dominated the efficiency (top-1 accuracy vs FLOPs) spectrum.
We extend our formulation in the video domain to construct Video Affine-Shift Transformer.
arXiv Detail & Related papers (2022-08-23T17:48:29Z) - U2-Former: A Nested U-shaped Transformer for Image Restoration [30.187257111046556]
We present a deep and effective Transformer-based network for image restoration, termed as U2-Former.
It is able to employ Transformer as the core operation to perform image restoration in a deep encoding and decoding space.
arXiv Detail & Related papers (2021-12-04T08:37:04Z) - Burst Image Restoration and Enhancement [86.08546447144377]
The goal of Burst Image Restoration is to effectively combine complimentary cues across multiple burst frames to generate high-quality outputs.
We create a set of emphpseudo-burst features that combine complimentary information from all the input burst frames to seamlessly exchange information.
Our approach delivers state of the art performance on burst super-resolution and low-light image enhancement tasks.
arXiv Detail & Related papers (2021-10-07T17:58:56Z) - Decoupled Spatial-Temporal Transformer for Video Inpainting [77.8621673355983]
Video aims to fill the given holes with realistic appearance but is still a challenging task even with prosperous deep learning approaches.
Recent works introduce the promising Transformer architecture into deep video inpainting and achieve better performance.
We propose a Decoupled Spatial-Temporal Transformer (DSTT) for improving video inpainting with exceptional efficiency.
arXiv Detail & Related papers (2021-04-14T05:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.