Segmenting Transparent Object in the Wild with Transformer
- URL: http://arxiv.org/abs/2101.08461v3
- Date: Tue, 23 Feb 2021 13:23:16 GMT
- Title: Segmenting Transparent Object in the Wild with Transformer
- Authors: Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang,
Ping Luo
- Abstract summary: This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1.
It has 11 fine-grained categories of transparent objects, commonly occurring in the human domestic environment.
A novel transformer-based segmentation pipeline termed Trans2Seg is proposed.
- Score: 47.97930429998238
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work presents a new fine-grained transparent object segmentation
dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale
transparent object segmentation dataset. Unlike Trans10K-v1 that only has two
limited categories, our new dataset has several appealing benefits. (1) It has
11 fine-grained categories of transparent objects, commonly occurring in the
human domestic environment, making it more practical for real-world
application. (2) Trans10K-v2 brings more challenges for the current advanced
segmentation methods than its former version. Furthermore, a novel
transformer-based segmentation pipeline termed Trans2Seg is proposed. Firstly,
the transformer encoder of Trans2Seg provides the global receptive field in
contrast to CNN's local receptive field, which shows excellent advantages over
pure CNN architectures. Secondly, by formulating semantic segmentation as a
problem of dictionary look-up, we design a set of learnable prototypes as the
query of Trans2Seg's transformer decoder, where each prototype learns the
statistics of one category in the whole dataset. We benchmark more than 20
recent semantic segmentation methods, demonstrating that Trans2Seg
significantly outperforms all the CNN-based methods, showing the proposed
algorithm's potential ability to solve transparent object segmentation.
Related papers
- Dual-Augmented Transformer Network for Weakly Supervised Semantic
Segmentation [4.02487511510606]
Weakly supervised semantic segmentation (WSSS) is a fundamental computer vision task, which aims to segment out the object within only class-level labels.
Traditional methods adopt the CNN-based network and utilize the class activation map (CAM) strategy to discover the object regions.
An alternative is to explore vision transformers (ViT) to encode the image to acquire the global semantic information.
We propose a dual network with both CNN-based and transformer networks for mutually complementary learning.
arXiv Detail & Related papers (2023-09-30T08:41:11Z) - HGFormer: Hierarchical Grouping Transformer for Domain Generalized
Semantic Segmentation [113.6560373226501]
This work studies semantic segmentation under the domain generalization setting.
We propose a novel hierarchical grouping transformer (HGFormer) to explicitly group pixels to form part-level masks and then whole-level masks.
Experiments show that HGFormer yields more robust semantic segmentation results than per-pixel classification methods and flat grouping transformers.
arXiv Detail & Related papers (2023-05-22T13:33:41Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - WegFormer: Transformers for Weakly Supervised Semantic Segmentation [32.3201557200616]
This work introduces Transformer to build a simple and effective WSSS framework, termed WegFormer.
Unlike existing CNN-based methods, WegFormer uses Vision Transformer as a classifier to produce high-quality pseudo segmentation masks.
WegFormer achieves state-of-the-art 70.5% mIoU on the PASCAL VOC dataset, significantly outperforming the previous best method.
arXiv Detail & Related papers (2022-03-16T06:50:31Z) - SegTransVAE: Hybrid CNN -- Transformer with Regularization for medical
image segmentation [0.0]
A novel network named SegTransVAE is proposed in this paper.
SegTransVAE is built upon encoder-decoder architecture, exploiting transformer with the variational autoencoder (VAE) branch to the network.
Evaluation on various recently introduced datasets shows that SegTransVAE outperforms previous methods in Dice Score and $95%$-Haudorff Distance.
arXiv Detail & Related papers (2022-01-21T08:02:55Z) - SOTR: Segmenting Objects with Transformers [0.0]
We present a novel, flexible, and effective transformer-based model for high-quality instance segmentation.
The proposed method, Segmenting Objects with TRansformers (SOTR), simplifies the segmentation pipeline.
Our SOTR performs well on the MS COCO dataset and surpasses state-of-the-art instance segmentation approaches.
arXiv Detail & Related papers (2021-08-15T14:10:11Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z) - Segmenting Transparent Objects in the Wild [98.80906604285163]
This work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10,428 images of real scenarios with carefully manual annotations.
To evaluate the effectiveness of Trans10K, we propose a novel boundary-aware segmentation method, termed TransLab, which exploits boundary as the clue to improve segmentation of transparent objects.
arXiv Detail & Related papers (2020-03-31T04:44:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.