StructToken : Rethinking Semantic Segmentation with Structural Prior
- URL: http://arxiv.org/abs/2203.12612v6
- Date: Fri, 31 Mar 2023 09:11:53 GMT
- Title: StructToken : Rethinking Semantic Segmentation with Structural Prior
- Authors: Fangjian Lin, Zhanhao Liang, Sitong Wu, Junjun He, Kai Chen, Shengwei
Tian
- Abstract summary: We present a new paradigm for semantic segmentation, named structure-aware extraction.
It generates the segmentation results via the interactions between a set of learned structure tokens and the image feature, which aims to progressively extract the structural information of each category from the feature.
Our StructToken outperforms the state-of-the-art on three widely-used benchmarks, including ADE20K, Cityscapes, and COCO-Stuff-10K.
- Score: 14.056789487558731
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In previous deep-learning-based methods, semantic segmentation has been
regarded as a static or dynamic per-pixel classification task, \textit{i.e.,}
classify each pixel representation to a specific category. However, these
methods only focus on learning better pixel representations or classification
kernels while ignoring the structural information of objects, which is critical
to human decision-making mechanism. In this paper, we present a new paradigm
for semantic segmentation, named structure-aware extraction. Specifically, it
generates the segmentation results via the interactions between a set of
learned structure tokens and the image feature, which aims to progressively
extract the structural information of each category from the feature. Extensive
experiments show that our StructToken outperforms the state-of-the-art on three
widely-used benchmarks, including ADE20K, Cityscapes, and COCO-Stuff-10K.
Related papers
- Boosting Semantic Segmentation from the Perspective of Explicit Class
Embeddings [19.997929884477628]
We explore the mechanism of class embeddings and have an insight that more explicit and meaningful class embeddings can be generated based on class masks purposely.
We propose ECENet, a new segmentation paradigm, in which class embeddings are obtained and enhanced explicitly during interacting with multi-stage image features.
Our ECENet outperforms its counterparts on the ADE20K dataset with much less computational cost and achieves new state-of-the-art results on PASCAL-Context dataset.
arXiv Detail & Related papers (2023-08-24T16:16:10Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Deep Hierarchical Semantic Segmentation [76.40565872257709]
hierarchical semantic segmentation (HSS) aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.
HSSN casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models.
With hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations.
arXiv Detail & Related papers (2022-03-27T15:47:44Z) - 3D Compositional Zero-shot Learning with DeCompositional Consensus [102.7571947144639]
We argue that part knowledge should be composable beyond the observed object classes.
We present 3D Compositional Zero-shot Learning as a problem of part generalization from seen to unseen object classes.
arXiv Detail & Related papers (2021-11-29T16:34:53Z) - Robust 3D Scene Segmentation through Hierarchical and Learnable
Part-Fusion [9.275156524109438]
3D semantic segmentation is a fundamental building block for several scene understanding applications such as autonomous driving, robotics and AR/VR.
Previous methods have utilized hierarchical, iterative methods to fuse semantic and instance information, but they lack learnability in context fusion.
This paper presents Segment-Fusion, a novel attention-based method for hierarchical fusion of semantic and instance information.
arXiv Detail & Related papers (2021-11-16T13:14:47Z) - Exploring Cross-Image Pixel Contrast for Semantic Segmentation [130.22216825377618]
We propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes.
Our method can be effortlessly incorporated into existing segmentation frameworks without extra overhead during testing.
arXiv Detail & Related papers (2021-01-28T11:35:32Z) - From Pixel to Patch: Synthesize Context-aware Features for Zero-shot
Semantic Segmentation [22.88452754438478]
We focus on zero-shot semantic segmentation, which aims to segment unseen objects with only category-level semantic representations.
We propose a novel Context-aware feature Generation Network (CaGNet), which can synthesize context-aware pixel-wise visual features for unseen categories.
Experimental results on Pascal-VOC, Pascal-Context, and COCO-stuff show that our method significantly outperforms the existing zero-shot semantic segmentation methods.
arXiv Detail & Related papers (2020-09-25T13:26:30Z) - Weakly-Supervised Semantic Segmentation via Sub-category Exploration [73.03956876752868]
We propose a simple yet effective approach to enforce the network to pay attention to other parts of an object.
Specifically, we perform clustering on image features to generate pseudo sub-categories labels within each annotated parent class.
We conduct extensive analysis to validate the proposed method and show that our approach performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2020-08-03T20:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.