Category Feature Transformer for Semantic Segmentation
- URL: http://arxiv.org/abs/2308.05581v1
- Date: Thu, 10 Aug 2023 13:44:54 GMT
- Title: Category Feature Transformer for Semantic Segmentation
- Authors: Quan Tang, Chuanjian Liu, Fagui Liu, Yifan Liu, Jun Jiang, Bowen
Zhang, Kai Han, Yunhe Wang
- Abstract summary: CFT learns unified feature embeddings for individual semantic categories from high-level features during each aggregation process.
We conduct extensive experiments on popular semantic segmentation benchmarks.
The proposed CFT obtains a compelling 55.1% mIoU with greatly reduced model parameters and computations on the challenging ADE20K dataset.
- Score: 34.812688388968525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aggregation of multi-stage features has been revealed to play a significant
role in semantic segmentation. Unlike previous methods employing point-wise
summation or concatenation for feature aggregation, this study proposes the
Category Feature Transformer (CFT) that explores the flow of category embedding
and transformation among multi-stage features through the prevalent multi-head
attention mechanism. CFT learns unified feature embeddings for individual
semantic categories from high-level features during each aggregation process
and dynamically broadcasts them to high-resolution features. Integrating the
proposed CFT into a typical feature pyramid structure exhibits superior
performance over a broad range of backbone networks. We conduct extensive
experiments on popular semantic segmentation benchmarks. Specifically, the
proposed CFT obtains a compelling 55.1% mIoU with greatly reduced model
parameters and computations on the challenging ADE20K dataset.
Related papers
- CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection [1.837431956557716]
Feature pyramids have been widely adopted in convolutional neural networks (CNNs) and transformers for tasks like medical image segmentation and object detection.
We propose a novel decoder block that integrates feature pyramids and transformers.
Our model achieves superior performance in detecting small objects compared to existing methods.
arXiv Detail & Related papers (2024-04-23T18:46:07Z) - A Decoding Scheme with Successive Aggregation of Multi-Level Features for Light-Weight Semantic Segmentation [4.454210876879237]
We propose a novel decoding scheme for semantic segmentation.
It takes multi-level features from the encoder with multi-scale architecture.
It aims to achieve not only reduced computational expense but also higher segmentation accuracy.
arXiv Detail & Related papers (2024-02-17T05:31:10Z) - Multi-Content Interaction Network for Few-Shot Segmentation [37.80624074068096]
Few-Shot COCO is challenging for limited support images and large intra-class appearance discrepancies.
We propose a Multi-Content Interaction Network (MCINet) to remedy this issue.
MCINet improves FSS by incorporating the low-level structural information from another query branch into the high-level semantic features.
arXiv Detail & Related papers (2023-03-11T04:21:59Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - Consistency and Diversity induced Human Motion Segmentation [231.36289425663702]
We propose a novel Consistency and Diversity induced human Motion (CDMS) algorithm.
Our model factorizes the source and target data into distinct multi-layer feature spaces.
A multi-mutual learning strategy is carried out to reduce the domain gap between the source and target data.
arXiv Detail & Related papers (2022-02-10T06:23:56Z) - CFNet: Learning Correlation Functions for One-Stage Panoptic
Segmentation [46.252118473248316]
We propose to first predict semantic-level and instance-level correlations among different locations that are utilized to enhance the backbone features.
We then feed the improved discriminative features into the corresponding segmentation heads, respectively.
We achieve state-of-the-art performance on MS with $45.1$% PQ and ADE20k with $32.6$% PQ.
arXiv Detail & Related papers (2022-01-13T05:31:14Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Sequential Hierarchical Learning with Distribution Transformation for
Image Super-Resolution [83.70890515772456]
We build a sequential hierarchical learning super-resolution network (SHSR) for effective image SR.
We consider the inter-scale correlations of features, and devise a sequential multi-scale block (SMB) to progressively explore the hierarchical information.
Experiment results show SHSR achieves superior quantitative performance and visual quality to state-of-the-art methods.
arXiv Detail & Related papers (2020-07-19T01:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.