WegFormer: Transformers for Weakly Supervised Semantic Segmentation
- URL: http://arxiv.org/abs/2203.08421v1
- Date: Wed, 16 Mar 2022 06:50:31 GMT
- Title: WegFormer: Transformers for Weakly Supervised Semantic Segmentation
- Authors: Chunmeng Liu, Enze Xie, Wenjia Wang, Wenhai Wang, Guangyao Li, Ping
Luo
- Abstract summary: This work introduces Transformer to build a simple and effective WSSS framework, termed WegFormer.
Unlike existing CNN-based methods, WegFormer uses Vision Transformer as a classifier to produce high-quality pseudo segmentation masks.
WegFormer achieves state-of-the-art 70.5% mIoU on the PASCAL VOC dataset, significantly outperforming the previous best method.
- Score: 32.3201557200616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although convolutional neural networks (CNNs) have achieved remarkable
progress in weakly supervised semantic segmentation (WSSS), the effective
receptive field of CNN is insufficient to capture global context information,
leading to sub-optimal results. Inspired by the great success of Transformers
in fundamental vision areas, this work for the first time introduces
Transformer to build a simple and effective WSSS framework, termed WegFormer.
Unlike existing CNN-based methods, WegFormer uses Vision Transformer (ViT) as a
classifier to produce high-quality pseudo segmentation masks. To this end, we
introduce three tailored components in our Transformer-based framework, which
are (1) a Deep Taylor Decomposition (DTD) to generate attention maps, (2) a
soft erasing module to smooth the attention maps, and (3) an efficient
potential object mining (EPOM) to filter noisy activation in the background.
Without any bells and whistles, WegFormer achieves state-of-the-art 70.5% mIoU
on the PASCAL VOC dataset, significantly outperforming the previous best
method. We hope WegFormer provides a new perspective to tap the potential of
Transformer in weakly supervised semantic segmentation. Code will be released.
Related papers
- Efficient Point Transformer with Dynamic Token Aggregating for Point Cloud Processing [19.73918716354272]
We propose an efficient point TransFormer with Dynamic Token Aggregating (DTA-Former) for point cloud representation and processing.
It achieves SOTA performance with up to 30$times$ faster than prior point Transformers on ModelNet40, ShapeNet, and airborne MultiSpectral LiDAR (MS-LiDAR) datasets.
arXiv Detail & Related papers (2024-05-23T20:50:50Z) - Minimalist and High-Performance Semantic Segmentation with Plain Vision
Transformers [10.72362704573323]
We introduce the PlainSeg, a model comprising only three 3$times$3 convolutions in addition to the transformer layers.
We also present the PlainSeg-Hier, which allows for the utilization of hierarchical features.
arXiv Detail & Related papers (2023-10-19T14:01:40Z) - Dual-Augmented Transformer Network for Weakly Supervised Semantic
Segmentation [4.02487511510606]
Weakly supervised semantic segmentation (WSSS) is a fundamental computer vision task, which aims to segment out the object within only class-level labels.
Traditional methods adopt the CNN-based network and utilize the class activation map (CAM) strategy to discover the object regions.
An alternative is to explore vision transformers (ViT) to encode the image to acquire the global semantic information.
We propose a dual network with both CNN-based and transformer networks for mutually complementary learning.
arXiv Detail & Related papers (2023-09-30T08:41:11Z) - ConvFormer: Combining CNN and Transformer for Medical Image Segmentation [17.88894109620463]
We propose a hierarchical CNN and Transformer hybrid architecture, called ConvFormer, for medical image segmentation.
Our ConvFormer, trained from scratch, outperforms various CNN- or Transformer-based architectures, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-11-15T23:11:22Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Transformer Scale Gate for Semantic Segmentation [53.27673119360868]
Transformer Scale Gate (TSG) exploits cues in self and cross attentions in Vision Transformers for the scale selection.
Our experiments on the Pascal Context and ADE20K datasets demonstrate that our feature selection strategy achieves consistent gains.
arXiv Detail & Related papers (2022-05-14T13:11:39Z) - nnFormer: Interleaved Transformer for Volumetric Segmentation [50.10441845967601]
We introduce nnFormer, a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC.
arXiv Detail & Related papers (2021-09-07T17:08:24Z) - SOTR: Segmenting Objects with Transformers [0.0]
We present a novel, flexible, and effective transformer-based model for high-quality instance segmentation.
The proposed method, Segmenting Objects with TRansformers (SOTR), simplifies the segmentation pipeline.
Our SOTR performs well on the MS COCO dataset and surpasses state-of-the-art instance segmentation approaches.
arXiv Detail & Related papers (2021-08-15T14:10:11Z) - Container: Context Aggregation Network [83.12004501984043]
Recent finding shows that a simple based solution without any traditional convolutional or Transformer components can produce effective visual representations.
We present the model (CONText Ion NERtwok), a general-purpose building block for multi-head context aggregation.
In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named modellight, can be employed in object detection and instance segmentation networks.
arXiv Detail & Related papers (2021-06-02T18:09:11Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - Spherical Transformer: Adapting Spherical Signal to CNNs [53.18482213611481]
Spherical Transformer can transform spherical signals into vectors that can be directly processed by standard CNNs.
We evaluate our approach on the tasks of spherical MNIST recognition, 3D object classification and omnidirectional image semantic segmentation.
arXiv Detail & Related papers (2021-01-11T12:33:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.