SemFormer: Semantic Guided Activation Transformer for Weakly Supervised
Semantic Segmentation
- URL: http://arxiv.org/abs/2210.14618v1
- Date: Wed, 26 Oct 2022 10:51:20 GMT
- Title: SemFormer: Semantic Guided Activation Transformer for Weakly Supervised
Semantic Segmentation
- Authors: Junliang Chen, Xiaodong Zhao, Cheng Luo, Linlin Shen
- Abstract summary: We propose a novel transformer-based framework, named Semantic Guided Activation Transformer (SemFormer) for WSSS.
We design a transformer-based Class-Aware AutoEncoder (CAAE) to extract the class embeddings for the input image and learn class semantics for all classes of the dataset.
Our SemFormer achieves textbf74.3% mIoU and surpasses many recent mainstream WSSS approaches by a large margin on PASCAL VOC 2012 dataset.
- Score: 36.80638177024504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent mainstream weakly supervised semantic segmentation (WSSS) approaches
are mainly based on Class Activation Map (CAM) generated by a CNN
(Convolutional Neural Network) based image classifier. In this paper, we
propose a novel transformer-based framework, named Semantic Guided Activation
Transformer (SemFormer), for WSSS. We design a transformer-based Class-Aware
AutoEncoder (CAAE) to extract the class embeddings for the input image and
learn class semantics for all classes of the dataset. The class embeddings and
learned class semantics are then used to guide the generation of activation
maps with four losses, i.e., class-foreground, class-background, activation
suppression, and activation complementation loss. Experimental results show
that our SemFormer achieves \textbf{74.3}\% mIoU and surpasses many recent
mainstream WSSS approaches by a large margin on PASCAL VOC 2012 dataset. Code
will be available at \url{https://github.com/JLChen-C/SemFormer}.
Related papers
- CoBra: Complementary Branch Fusing Class and Semantic Knowledge for Robust Weakly Supervised Semantic Segmentation [3.4248731707266264]
We propose a novel dual branch framework consisting of two distinct architectures which provide valuable complementary knowledge of class (from CNN) and semantic (from ViT) to each branch.
Our model, through CoBra, fuses CNN and ViT's complementary outputs to create robust pseudo masks that integrate both class and semantic information effectively.
arXiv Detail & Related papers (2024-02-05T12:33:37Z) - Boosting Semantic Segmentation from the Perspective of Explicit Class
Embeddings [19.997929884477628]
We explore the mechanism of class embeddings and have an insight that more explicit and meaningful class embeddings can be generated based on class masks purposely.
We propose ECENet, a new segmentation paradigm, in which class embeddings are obtained and enhanced explicitly during interacting with multi-stage image features.
Our ECENet outperforms its counterparts on the ADE20K dataset with much less computational cost and achieves new state-of-the-art results on PASCAL-Context dataset.
arXiv Detail & Related papers (2023-08-24T16:16:10Z) - MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic
Segmentation [90.73815426893034]
We propose a transformer-based framework that aims to enhance weakly supervised semantic segmentation.
We introduce a Multi-Class Token transformer, which incorporates multiple class tokens to enable class-aware interactions with the patch tokens.
A Contrastive-Class-Token (CCT) module is proposed to enhance the learning of discriminative class tokens.
arXiv Detail & Related papers (2023-08-06T03:30:20Z) - Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided
Relation Alignment and Adaptation [98.51938442785179]
Incremental few-shot semantic segmentation aims to incrementally extend a semantic segmentation model to novel classes.
This task faces a severe semantic-aliasing issue between base and novel classes due to data imbalance.
We propose the Semantic-guided Relation Alignment and Adaptation (SRAA) method that fully considers the guidance of prior semantic information.
arXiv Detail & Related papers (2023-05-18T10:40:52Z) - SLAM: Semantic Learning based Activation Map for Weakly Supervised
Semantic Segmentation [34.996841532954925]
We propose a novel semantic learning based framework for WSSS, named SLAM (Semantic Learning based Activation Map)
We firstly design a semantic encoder to learn semantics of each object category and extract category-specific semantic embeddings from an input image.
Four loss functions, i.e., category-foreground, category-background, activation regularization, and consistency loss are proposed to ensure the correctness, completeness, compactness and consistency of the activation map.
arXiv Detail & Related papers (2022-10-22T11:17:30Z) - Multi-class Token Transformer for Weakly Supervised Semantic
Segmentation [94.78965643354285]
We propose a new transformer-based framework to learn class-specific object localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS)
Inspired by the fact that the attended regions of the one-class token in the standard vision transformer can be leveraged to form a class-agnostic localization map, we investigate if the transformer model can also effectively capture class-specific attention for more discriminative object localization.
The proposed framework is shown to fully complement the Class Activation Mapping (CAM) method, leading to remarkably superior WSSS results on the PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2022-03-06T07:18:23Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - Half-Real Half-Fake Distillation for Class-Incremental Semantic
Segmentation [84.1985497426083]
convolutional neural networks are ill-equipped for incremental learning.
New classes are available but the initial training data is not retained.
We try to address this issue by "inverting" the trained segmentation network to synthesize input images starting from random noise.
arXiv Detail & Related papers (2021-04-02T03:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.