Deep Video Matting via Spatio-Temporal Alignment and Aggregation
- URL: http://arxiv.org/abs/2104.11208v1
- Date: Thu, 22 Apr 2021 17:42:08 GMT
- Title: Deep Video Matting via Spatio-Temporal Alignment and Aggregation
- Authors: Yanan Sun, Guanzhi Wang, Qiao Gu, Chi-Keung Tang, Yu-Wing Tai
- Abstract summary: We propose a deep learning-based video matting framework which employs a novel aggregation feature module (STFAM)
To eliminate frame-by-frame trimap annotations, a lightweight interactive trimap propagation network is also introduced.
Our framework significantly outperforms conventional video matting and deep image matting methods.
- Score: 63.6870051909004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the significant progress made by deep learning in natural image
matting, there has been so far no representative work on deep learning for
video matting due to the inherent technical challenges in reasoning temporal
domain and lack of large-scale video matting datasets. In this paper, we
propose a deep learning-based video matting framework which employs a novel and
effective spatio-temporal feature aggregation module (ST-FAM). As optical flow
estimation can be very unreliable within matting regions, ST-FAM is designed to
effectively align and aggregate information across different spatial scales and
temporal frames within the network decoder. To eliminate frame-by-frame trimap
annotations, a lightweight interactive trimap propagation network is also
introduced. The other contribution consists of a large-scale video matting
dataset with groundtruth alpha mattes for quantitative evaluation and
real-world high-resolution videos with trimaps for qualitative evaluation.
Quantitative and qualitative experimental results show that our framework
significantly outperforms conventional video matting and deep image matting
methods applied to video in presence of multi-frame temporal information.
Related papers
- Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for
Temporal Forgery Localization [16.963092523737593]
We propose a novel framework for temporal forgery localization (TFL) that predicts forgery segments with multimodal adaptation.
Our approach achieves state-of-the-art performance on benchmark datasets, including Lav-DF, TVIL, and Psynd.
arXiv Detail & Related papers (2023-08-28T08:20:30Z) - Adaptive Human Matting for Dynamic Videos [62.026375402656754]
Adaptive Matting for Dynamic Videos, termed AdaM, is a framework for simultaneously differentiating foregrounds from backgrounds.
Two interconnected network designs are employed to achieve this goal.
We benchmark and study our methods recently introduced datasets, showing that our matting achieves new best-in-class generalizability.
arXiv Detail & Related papers (2023-04-12T17:55:59Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Spatiotemporal Inconsistency Learning for DeepFake Video Detection [51.747219106855624]
We present a novel temporal modeling paradigm in TIM by exploiting the temporal difference over adjacent frames along with both horizontal and vertical directions.
And the ISM simultaneously utilizes the spatial information from SIM and temporal information from TIM to establish a more comprehensive spatial-temporal representation.
arXiv Detail & Related papers (2021-09-04T13:05:37Z) - Depth-Aware Multi-Grid Deep Homography Estimation with Contextual
Correlation [38.95610086309832]
Homography estimation is an important task in computer vision, such as image stitching, video stabilization, and camera calibration.
Traditional homography estimation methods depend on the quantity and distribution of feature points, leading to poor robustness in textureless scenes.
We propose a contextual correlation layer, which can capture the long-range correlation on feature maps and flexibly be bridged in a learning framework.
We equip our network with depth perception capability, by introducing a novel depth-aware shape-preserved loss.
arXiv Detail & Related papers (2021-07-06T10:33:12Z) - Attention-guided Temporal Coherent Video Object Matting [78.82835351423383]
We propose a novel deep learning-based object matting method that can achieve temporally coherent matting results.
Its key component is an attention-based temporal aggregation module that maximizes image matting networks' strength.
We show how to effectively solve the trimap generation problem by fine-tuning a state-of-the-art video object segmentation network.
arXiv Detail & Related papers (2021-05-24T17:34:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.