TransMatting: Tri-token Equipped Transformer Model for Image Matting
- URL: http://arxiv.org/abs/2303.06476v1
- Date: Sat, 11 Mar 2023 18:21:25 GMT
- Title: TransMatting: Tri-token Equipped Transformer Model for Image Matting
- Authors: Huanqia Cai, Fanglei Xue, Lele Xu, Lili Guo
- Abstract summary: We propose a Transformer-based network (TransMatting) to model transparent objects with long-range features.
We also redesign the trimap as three learnable tokens, named tri-token.
Our proposed TransMatting outperforms current state-of-the-art methods on several popular matting benchmarks and our newly collected Transparent-460.
- Score: 4.012340049240327
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image matting aims to predict alpha values of elaborate uncertainty areas of
natural images, like hairs, smoke, and spider web. However, existing methods
perform poorly when faced with highly transparent foreground objects due to the
large area of uncertainty to predict and the small receptive field of
convolutional networks. To address this issue, we propose a Transformer-based
network (TransMatting) to model transparent objects with long-range features
and collect a high-resolution matting dataset of transparent objects
(Transparent-460) for performance evaluation. Specifically, to utilize semantic
information in the trimap flexibly and effectively, we also redesign the trimap
as three learnable tokens, named tri-token. Both Transformer and convolution
matting models could benefit from our proposed tri-token design. By replacing
the traditional trimap concatenation strategy with our tri-token, existing
matting methods could achieve about 10% improvement in SAD and 20% in MSE.
Equipped with the new tri-token design, our proposed TransMatting outperforms
current state-of-the-art methods on several popular matting benchmarks and our
newly collected Transparent-460.
Related papers
- Towards Natural Image Matting in the Wild via Real-Scenario Prior [69.96414467916863]
We propose a new matting dataset based on the COCO dataset, namely COCO-Matting.
The built COCO-Matting comprises an extensive collection of 38,251 human instance-level alpha mattes in complex natural scenarios.
For network architecture, the proposed feature-aligned transformer learns to extract fine-grained edge and transparency features.
The proposed matte-aligned decoder aims to segment matting-specific objects and convert coarse masks into high-precision mattes.
arXiv Detail & Related papers (2024-10-09T06:43:19Z) - Adaptive Human Matting for Dynamic Videos [62.026375402656754]
Adaptive Matting for Dynamic Videos, termed AdaM, is a framework for simultaneously differentiating foregrounds from backgrounds.
Two interconnected network designs are employed to achieve this goal.
We benchmark and study our methods recently introduced datasets, showing that our matting achieves new best-in-class generalizability.
arXiv Detail & Related papers (2023-04-12T17:55:59Z) - TransMatting: Enhancing Transparent Objects Matting with Transformers [4.012340049240327]
We propose a Transformer-based network, TransMatting, to model transparent objects with a big receptive field.
A small convolutional network is proposed to utilize the global feature and non-background mask to guide the multi-scale feature propagation from encoder to decoder.
We create a high-resolution matting dataset of transparent objects with small known foreground areas.
arXiv Detail & Related papers (2022-08-05T06:44:14Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - Extracting Triangular 3D Models, Materials, and Lighting From Images [59.33666140713829]
We present an efficient method for joint optimization of materials and lighting from multi-view image observations.
We leverage meshes with spatially-varying materials and environment that can be deployed in any traditional graphics engine.
arXiv Detail & Related papers (2021-11-24T13:58:20Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Human Perception Modeling for Automatic Natural Image Matting [2.179313476241343]
Natural image matting aims to precisely separate foreground objects from background using alpha matte.
We propose an intuitively-designed trimap-free two-stage matting approach without additional annotations.
Our matting algorithm has competitive performance with current state-of-the-art methods in both trimap-free and trimap-needed aspects.
arXiv Detail & Related papers (2021-03-31T12:08:28Z) - Salient Image Matting [0.0]
We propose an image matting framework called Salient Image Matting to estimate the per-pixel opacity value of the most salient foreground in an image.
Our framework simultaneously deals with the challenge of learning a wide range of semantics and salient object types.
Our framework requires only a fraction of expensive matting data as compared to other automatic methods.
arXiv Detail & Related papers (2021-03-23T06:22:33Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.