SegNeXt: Rethinking Convolutional Attention Design for Semantic
Segmentation
- URL: http://arxiv.org/abs/2209.08575v1
- Date: Sun, 18 Sep 2022 14:33:49 GMT
- Title: SegNeXt: Rethinking Convolutional Attention Design for Semantic
Segmentation
- Authors: Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, Zhengning Liu, Ming-Ming Cheng,
Shi-Min Hu
- Abstract summary: We present SegNeXt, a simple convolutional network architecture for semantic segmentation.
We show that convolutional attention is a more efficient and effective way to encode contextual information than the self-attention mechanism in transformers.
- Score: 100.89770978711464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of
semantic segmentation due to the efficiency of self-attention in encoding
spatial information. In this paper, we show that convolutional attention is a
more efficient and effective way to encode contextual information than the
self-attention mechanism in transformers. By re-examining the characteristics
owned by successful segmentation models, we discover several key components
leading to the performance improvement of segmentation models. This motivates
us to design a novel convolutional attention network that uses cheap
convolutional operations. Without bells and whistles, our SegNeXt significantly
improves the performance of previous state-of-the-art methods on popular
benchmarks, including ADE20K, Cityscapes, COCO-Stuff, Pascal VOC, Pascal
Context, and iSAID. Notably, SegNeXt outperforms EfficientNet-L2 w/ NAS-FPN and
achieves 90.6% mIoU on the Pascal VOC 2012 test leaderboard using only 1/10
parameters of it. On average, SegNeXt achieves about 2.0% mIoU improvements
compared to the state-of-the-art methods on the ADE20K datasets with the same
or fewer computations. Code is available at https://github.com/uyzhang/JSeg
(Jittor) and https://github.com/Visual-Attention-Network/SegNeXt (Pytorch).
Related papers
- UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation [26.91063423376469]
Semi-supervised semantic segmentation (SSS) aims at learning rich visual knowledge from cheap unlabeled images.
We present our upgraded and simplified UniMatch V2, inheriting the core spirit of weak-to-strong consistency from V1.
arXiv Detail & Related papers (2024-10-14T17:49:27Z) - SegViTv2: Exploring Efficient and Continual Semantic Segmentation with
Plain Vision Transformers [76.13755422671822]
This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework.
We introduce a novel Attention-to-Mask (atm) module to design a lightweight decoder effective for plain ViT.
Our decoder outperforms the popular decoder UPerNet using various ViT backbones while consuming only about $5%$ of the computational cost.
arXiv Detail & Related papers (2023-06-09T22:29:56Z) - RTFormer: Efficient Design for Real-Time Semantic Segmentation with
Transformer [63.25665813125223]
We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation.
It achieves better trade-off between performance and efficiency than CNN-based models.
Experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer.
arXiv Detail & Related papers (2022-10-13T16:03:53Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - Dynamically pruning segformer for efficient semantic segmentation [8.29672153078638]
We seek to design a lightweight SegFormer for efficient semantic segmentation.
Based on the observation that neurons in SegFormer layers exhibit large variances across different images, we propose a dynamic gated linear layer.
We also introduce two-stage knowledge distillation to transfer the knowledge within the original teacher to the pruned student network.
arXiv Detail & Related papers (2021-11-18T03:34:28Z) - SegFormer: Simple and Efficient Design for Semantic Segmentation with
Transformers [79.646577541655]
We present SegFormer, a semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders.
SegFormer comprises a novelly structured encoder which outputs multiscale features.
The proposed decoder aggregates information from different layers, and thus combining both local attention and global attention to powerful representations.
arXiv Detail & Related papers (2021-05-31T17:59:51Z) - Unifying Instance and Panoptic Segmentation with Dynamic Rank-1
Convolutions [109.2706837177222]
DR1Mask is the first panoptic segmentation framework that exploits a shared feature map for both instance and semantic segmentation.
As a byproduct, DR1Mask is 10% faster and 1 point in mAP more accurate than previous state-of-the-art instance segmentation network BlendMask.
arXiv Detail & Related papers (2020-11-19T12:42:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.