Attention-guided Temporal Coherent Video Object Matting
- URL: http://arxiv.org/abs/2105.11427v1
- Date: Mon, 24 May 2021 17:34:57 GMT
- Title: Attention-guided Temporal Coherent Video Object Matting
- Authors: Yunke Zhang, Chi Wang, Miaomiao Cui, Peiran Ren, Xuansong Xie,
Xian-sheng Hua, Hujun Bao, Qixing Huang, Weiwei Xu
- Abstract summary: We propose a novel deep learning-based object matting method that can achieve temporally coherent matting results.
Its key component is an attention-based temporal aggregation module that maximizes image matting networks' strength.
We show how to effectively solve the trimap generation problem by fine-tuning a state-of-the-art video object segmentation network.
- Score: 78.82835351423383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel deep learning-based video object matting method
that can achieve temporally coherent matting results. Its key component is an
attention-based temporal aggregation module that maximizes image matting
networks' strength for video matting networks. This module computes temporal
correlations for pixels adjacent to each other along the time axis in feature
space to be robust against motion noises. We also design a novel loss term to
train the attention weights, which drastically boosts the video matting
performance. Besides, we show how to effectively solve the trimap generation
problem by fine-tuning a state-of-the-art video object segmentation network
with a sparse set of user-annotated keyframes. To facilitate video matting and
trimap generation networks' training, we construct a large-scale video matting
dataset with 80 training and 28 validation foreground video clips with
ground-truth alpha mattes. Experimental results show that our method can
generate high-quality alpha mattes for various videos featuring appearance
change, occlusion, and fast motion. Our code and dataset can be found at
https://github.com/yunkezhang/TCVOM
Related papers
- MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion [3.7270979204213446]
We present four key contributions to address the challenges of video processing.
First, we introduce the 3D Inverted Vector-Quantization Variencoenco Autocoder.
Second, we present MotionAura, a text-to-video generation framework.
Third, we propose a spectral transformer-based denoising network.
Fourth, we introduce a downstream task of Sketch Guided Videopainting.
arXiv Detail & Related papers (2024-10-10T07:07:56Z) - SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Trusted Video Inpainting Localization via Deep Attentive Noise Learning [2.1210527985139227]
We present a Trusted Video Inpainting localization network (TruVIL) with excellent robustness and generalization ability.
We design deep attentive noise learning in multiple stages to capture the inpainted traces.
To prepare enough training samples, we also build a frame-level video object segmentation dataset of 2500 videos.
arXiv Detail & Related papers (2024-06-19T14:08:58Z) - Adaptive Human Matting for Dynamic Videos [62.026375402656754]
Adaptive Matting for Dynamic Videos, termed AdaM, is a framework for simultaneously differentiating foregrounds from backgrounds.
Two interconnected network designs are employed to achieve this goal.
We benchmark and study our methods recently introduced datasets, showing that our matting achieves new best-in-class generalizability.
arXiv Detail & Related papers (2023-04-12T17:55:59Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Temporally Coherent Person Matting Trained on Fake-Motion Dataset [0.0]
We propose a novel method to perform matting of videos depicting people that does not require additional user input such as trimaps.
Our architecture achieves temporal stability of the resulting alpha mattes by using motion-estimation-based smoothing of image-segmentation algorithm outputs.
We also propose a fake-motion algorithm that generates training clips for the video-matting network given photos with ground-truth alpha mattes and background videos.
arXiv Detail & Related papers (2021-09-10T12:53:11Z) - Deep Video Matting via Spatio-Temporal Alignment and Aggregation [63.6870051909004]
We propose a deep learning-based video matting framework which employs a novel aggregation feature module (STFAM)
To eliminate frame-by-frame trimap annotations, a lightweight interactive trimap propagation network is also introduced.
Our framework significantly outperforms conventional video matting and deep image matting methods.
arXiv Detail & Related papers (2021-04-22T17:42:08Z) - Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in
Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos.
We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks.
The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.