One-Trimap Video Matting
- URL: http://arxiv.org/abs/2207.13353v1
- Date: Wed, 27 Jul 2022 08:19:41 GMT
- Title: One-Trimap Video Matting
- Authors: Hongje Seong and Seoung Wug Oh and Brian Price and Euntai Kim and
Joon-Young Lee
- Abstract summary: We propose One-Trimap Video Matting network (OTVM) that performs video matting robustly using only one user-annotated trimap.
A key of OTVM is the joint modeling of trimap propagation and alpha prediction.
We evaluate our model on two latest video matting benchmarks, Deep Video Matting and VideoMatting108, and outperform state-of-the-art by significant margins.
- Score: 47.95947397358026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies made great progress in video matting by extending the success
of trimap-based image matting to the video domain. In this paper, we push this
task toward a more practical setting and propose One-Trimap Video Matting
network (OTVM) that performs video matting robustly using only one
user-annotated trimap. A key of OTVM is the joint modeling of trimap
propagation and alpha prediction. Starting from baseline trimap propagation and
alpha prediction networks, our OTVM combines the two networks with an
alpha-trimap refinement module to facilitate information flow. We also present
an end-to-end training strategy to take full advantage of the joint model. Our
joint modeling greatly improves the temporal stability of trimap propagation
compared to the previous decoupled methods. We evaluate our model on two latest
video matting benchmarks, Deep Video Matting and VideoMatting108, and
outperform state-of-the-art by significant margins (MSE improvements of 56.4%
and 56.7%, respectively). The source code and model are available online:
https://github.com/Hongje/OTVM.
Related papers
- SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Matte Anything: Interactive Natural Image Matting with Segment Anything
Models [35.105593013654]
Matte Anything (MatAny) is an interactive natural image matting model that could produce high-quality alpha-matte.
We leverage vision foundation models to enhance the performance of natural image matting.
MatAny has 58.3% improvement on MSE and 40.6% improvement on SAD compared to the previous image matting methods.
arXiv Detail & Related papers (2023-06-07T03:31:39Z) - Adaptive Human Matting for Dynamic Videos [62.026375402656754]
Adaptive Matting for Dynamic Videos, termed AdaM, is a framework for simultaneously differentiating foregrounds from backgrounds.
Two interconnected network designs are employed to achieve this goal.
We benchmark and study our methods recently introduced datasets, showing that our matting achieves new best-in-class generalizability.
arXiv Detail & Related papers (2023-04-12T17:55:59Z) - RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in
Autonomous Driving [80.14669385741202]
Vision transformers (ViTs) have achieved state-of-the-art results in many image-based benchmarks.
ViTs are notoriously hard to train and require a lot of training data to learn powerful representations.
We show that our method, called RangeViT, outperforms existing projection-based methods on nuScenes and Semantic KITTI.
arXiv Detail & Related papers (2023-01-24T18:50:48Z) - Attention-guided Temporal Coherent Video Object Matting [78.82835351423383]
We propose a novel deep learning-based object matting method that can achieve temporally coherent matting results.
Its key component is an attention-based temporal aggregation module that maximizes image matting networks' strength.
We show how to effectively solve the trimap generation problem by fine-tuning a state-of-the-art video object segmentation network.
arXiv Detail & Related papers (2021-05-24T17:34:57Z) - M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object
Detection with Transformers [78.48081972698888]
We present M3DeTR, which combines different point cloud representations with different feature scales based on multi-scale feature pyramids.
M3DeTR is the first approach that unifies multiple point cloud representations, feature scales, as well as models mutual relationships between point clouds simultaneously using transformers.
arXiv Detail & Related papers (2021-04-24T06:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.