Related papers: SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation

SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation

URL: http://arxiv.org/abs/2209.08575v1
Date: Sun, 18 Sep 2022 14:33:49 GMT
Title: SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation
Authors: Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, Zhengning Liu, Ming-Ming Cheng, Shi-Min Hu
Abstract summary: We present SegNeXt, a simple convolutional network architecture for semantic segmentation. We show that convolutional attention is a more efficient and effective way to encode contextual information than the self-attention mechanism in transformers.
Score: 100.89770978711464
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present SegNeXt, a simple convolutional network architecture for semantic segmentation. Recent transformer-based models have dominated the field of semantic segmentation due to the efficiency of self-attention in encoding spatial information. In this paper, we show that convolutional attention is a more efficient and effective way to encode contextual information than the self-attention mechanism in transformers. By re-examining the characteristics owned by successful segmentation models, we discover several key components leading to the performance improvement of segmentation models. This motivates us to design a novel convolutional attention network that uses cheap convolutional operations. Without bells and whistles, our SegNeXt significantly improves the performance of previous state-of-the-art methods on popular benchmarks, including ADE20K, Cityscapes, COCO-Stuff, Pascal VOC, Pascal Context, and iSAID. Notably, SegNeXt outperforms EfficientNet-L2 w/ NAS-FPN and achieves 90.6% mIoU on the Pascal VOC 2012 test leaderboard using only 1/10 parameters of it. On average, SegNeXt achieves about 2.0% mIoU improvements compared to the state-of-the-art methods on the ADE20K datasets with the same or fewer computations. Code is available at https://github.com/uyzhang/JSeg (Jittor) and https://github.com/Visual-Attention-Network/SegNeXt (Pytorch).

Related papers

UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation [26.91063423376469]
Semi-supervised semantic segmentation (SSS) aims at learning rich visual knowledge from cheap unlabeled images. We present our upgraded and simplified UniMatch V2, inheriting the core spirit of weak-to-strong consistency from V1.
arXiv Detail & Related papers (2024-10-14T17:49:27Z)
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers [76.13755422671822]
This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework. We introduce a novel Attention-to-Mask (atm) module to design a lightweight decoder effective for plain ViT. Our decoder outperforms the popular decoder UPerNet using various ViT backbones while consuming only about $5%$ of the computational cost.
arXiv Detail & Related papers (2023-06-09T22:29:56Z)
RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer [63.25665813125223]
We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation. It achieves better trade-off between performance and efficiency than CNN-based models. Experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer.
arXiv Detail & Related papers (2022-10-13T16:03:53Z)
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups. Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K. Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z)
Dynamically pruning segformer for efficient semantic segmentation [8.29672153078638]
We seek to design a lightweight SegFormer for efficient semantic segmentation. Based on the observation that neurons in SegFormer layers exhibit large variances across different images, we propose a dynamic gated linear layer. We also introduce two-stage knowledge distillation to transfer the knowledge within the original teacher to the pruned student network.
arXiv Detail & Related papers (2021-11-18T03:34:28Z)
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers [79.646577541655]
We present SegFormer, a semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. SegFormer comprises a novelly structured encoder which outputs multiscale features. The proposed decoder aggregates information from different layers, and thus combining both local attention and global attention to powerful representations.
arXiv Detail & Related papers (2021-05-31T17:59:51Z)
Unifying Instance and Panoptic Segmentation with Dynamic Rank-1 Convolutions [109.2706837177222]
DR1Mask is the first panoptic segmentation framework that exploits a shared feature map for both instance and semantic segmentation. As a byproduct, DR1Mask is 10% faster and 1 point in mAP more accurate than previous state-of-the-art instance segmentation network BlendMask.
arXiv Detail & Related papers (2020-11-19T12:42:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.