Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting
- URL: http://arxiv.org/abs/2301.10048v2
- Date: Tue, 19 Mar 2024 04:02:28 GMT
- Title: Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting
- Authors: Kaidong Zhang, Jialun Peng, Jingjing Fu, Dong Liu,
- Abstract summary: We propose flow-guided transformer (FGT) to pursue more effective and efficient video inpainting.
FGT++ is experimentally evaluated to be outperforming the existing video inpainting networks.
- Score: 11.837764007052813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have been widely used for video processing owing to the multi-head self attention (MHSA) mechanism. However, the MHSA mechanism encounters an intrinsic difficulty for video inpainting, since the features associated with the corrupted regions are degraded and incur inaccurate self attention. This problem, termed query degradation, may be mitigated by first completing optical flows and then using the flows to guide the self attention, which was verified in our previous work - flow-guided transformer (FGT). We further exploit the flow guidance and propose FGT++ to pursue more effective and efficient video inpainting. First, we design a lightweight flow completion network by using local aggregation and edge loss. Second, to address the query degradation, we propose a flow guidance feature integration module, which uses the motion discrepancy to enhance the features, together with a flow-guided feature propagation module that warps the features according to the flows. Third, we decouple the transformer along the temporal and spatial dimensions, where flows are used to select the tokens through a temporally deformable MHSA mechanism, and global tokens are combined with the inner-window local tokens through a dual perspective MHSA mechanism. FGT++ is experimentally evaluated to be outperforming the existing video inpainting networks qualitatively and quantitatively.
Related papers
- Coherent Video Inpainting Using Optical Flow-Guided Efficient Diffusion [15.188335671278024]
We propose a new video inpainting framework using optical Flow-guided Efficient Diffusion (FloED) for higher video coherence.
FloED employs a dual-branch architecture, where the time-agnostic flow branch restores corrupted flow first, and the multi-scale flow adapters provide motion guidance to the main inpainting branch.
Experiments on background restoration and object removal tasks show that FloED outperforms state-of-the-art diffusion-based methods in both quality and efficiency.
arXiv Detail & Related papers (2024-12-01T15:45:26Z) - A Hybrid Transformer-Mamba Network for Single Image Deraining [70.64069487982916]
Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions.
We introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies.
arXiv Detail & Related papers (2024-08-31T10:03:19Z) - Dual-Stream Attention Transformers for Sewer Defect Classification [2.5499055723658097]
We propose a dual-stream vision transformer architecture that processes RGB and optical flow inputs for efficient sewer defect classification.
Our key idea is to use self-attention regularization to harness the complementary strengths of the RGB and motion streams.
By leveraging motion cues through a self-attention regularizer, we align and enhance RGB attention maps, enabling the network to concentrate on pertinent input regions.
arXiv Detail & Related papers (2023-11-07T02:31:51Z) - GAFlow: Incorporating Gaussian Attention into Optical Flow [62.646389181507764]
We push Gaussian Attention (GA) into the optical flow models to accentuate local properties during representation learning.
We introduce a novel Gaussian-Constrained Layer (GCL) which can be easily plugged into existing Transformer blocks.
For reliable motion analysis, we provide a new Gaussian-Guided Attention Module (GGAM)
arXiv Detail & Related papers (2023-09-28T07:46:01Z) - ProPainter: Improving Propagation and Transformer for Video Inpainting [98.70898369695517]
Flow-based propagation and computational Transformer are two mainstream mechanisms in video intemporal (VI)
We introduce dual-domain propagation that combines the advantages of image and feature warping, exploiting global correspondences reliably.
We also propose a mask-guided sparse video Transformer, which achieves high efficiency by discarding redundant tokens.
arXiv Detail & Related papers (2023-09-07T17:57:29Z) - Flow-Guided Transformer for Video Inpainting [10.31469470212101]
We propose a flow-guided transformer, which innovatively leverage the motion discrepancy exposed by optical flows to instruct the attention retrieval in transformer for high fidelity video inpainting.
With the completed flows, we propagate the content across video frames, and adopt the flow-guided transformer to synthesize the rest corrupted regions.
We decouple transformers along temporal and spatial dimension, so that we can easily integrate the locally relevant completed flows to instruct spatial attention only.
arXiv Detail & Related papers (2022-08-14T03:10:01Z) - Implicit Motion-Compensated Network for Unsupervised Video Object
Segmentation [25.41427065435164]
Unsupervised video object segmentation (UVOS) aims at automatically separating the primary foreground object(s) from the background in a video sequence.
Existing UVOS methods either lack robustness when there are visually similar surroundings (appearance-based) or suffer from deterioration in the quality of their predictions because of dynamic background and inaccurate flow (flow-based)
We propose an implicit motion-compensated network (IMCNet) combining complementary cues ($textiti.e.$, appearance and motion) with aligned motion information from the adjacent frames to the current frame at the feature level.
arXiv Detail & Related papers (2022-04-06T13:03:59Z) - Video Super-Resolution Transformer [85.11270760456826]
Video super-resolution (VSR), with the aim to restore a high-resolution video from its corresponding low-resolution version, is a spatial-temporal sequence prediction problem.
Recently, Transformer has been gaining popularity due to its parallel computing ability for sequence-to-sequence modeling.
In this paper, we present a spatial-temporal convolutional self-attention layer with a theoretical understanding to exploit the locality information.
arXiv Detail & Related papers (2021-06-12T20:00:32Z) - Augmented Transformer with Adaptive Graph for Temporal Action Proposal
Generation [79.98992138865042]
We present an augmented transformer with adaptive graph network (ATAG) to exploit both long-range and local temporal contexts for TAPG.
Specifically, we enhance the vanilla transformer by equipping a snippet actionness loss and a front block, dubbed augmented transformer.
An adaptive graph convolutional network (GCN) is proposed to build local temporal context by mining the position information and difference between adjacent features.
arXiv Detail & Related papers (2021-03-30T02:01:03Z) - Efficient Two-Stream Network for Violence Detection Using Separable
Convolutional LSTM [0.0]
We propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet.
SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution.
Our model outperforms the accuracy on the larger and more challenging RWF-2000 dataset by more than a 2% margin.
arXiv Detail & Related papers (2021-02-21T12:01:48Z) - Feature Flow: In-network Feature Flow Estimation for Video Object
Detection [56.80974623192569]
Optical flow is widely used in computer vision tasks to provide pixel-level motion information.
A common approach is to:forward optical flow to a neural network and fine-tune this network on the task dataset.
We propose a novel network (IFF-Net) with an textbfIn-network textbfFeature textbfFlow estimation module for video object detection.
arXiv Detail & Related papers (2020-09-21T07:55:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.