Rethinking Semantic Segmentation: A Prototype View
- URL: http://arxiv.org/abs/2203.15102v1
- Date: Mon, 28 Mar 2022 21:15:32 GMT
- Title: Rethinking Semantic Segmentation: A Prototype View
- Authors: Tianfei Zhou, Wenguan Wang, Ender Konukoglu, Luc Van Gool
- Abstract summary: We present a nonparametric semantic segmentation model based on non-learnable prototypes.
Our framework yields compelling results over several datasets.
We expect this work will provoke a rethink of the current de facto semantic segmentation model design.
- Score: 126.59244185849838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prevalent semantic segmentation solutions, despite their different network
designs (FCN based or attention based) and mask decoding strategies (parametric
softmax based or pixel-query based), can be placed in one category, by
considering the softmax weights or query vectors as learnable class prototypes.
In light of this prototype view, this study uncovers several limitations of
such parametric segmentation regime, and proposes a nonparametric alternative
based on non-learnable prototypes. Instead of prior methods learning a single
weight/query vector for each class in a fully parametric manner, our model
represents each class as a set of non-learnable prototypes, relying solely on
the mean features of several training pixels within that class. The dense
prediction is thus achieved by nonparametric nearest prototype retrieving. This
allows our model to directly shape the pixel embedding space, by optimizing the
arrangement between embedded pixels and anchored prototypes. It is able to
handle arbitrary number of classes with a constant amount of learnable
parameters. We empirically show that, with FCN based and attention based
segmentation models (i.e., HRNet, Swin, SegFormer) and backbones (i.e., ResNet,
HRNet, Swin, MiT), our nonparametric framework yields compelling results over
several datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), and performs well in
the large-vocabulary situation. We expect this work will provoke a rethink of
the current de facto semantic segmentation model design.
Related papers
- Multi-Scale Grouped Prototypes for Interpretable Semantic Segmentation [7.372346036256517]
Prototypical part learning is emerging as a promising approach for making semantic segmentation interpretable.
We propose a method for interpretable semantic segmentation that leverages multi-scale image representation for prototypical part learning.
Experiments conducted on Pascal VOC, Cityscapes, and ADE20K demonstrate that the proposed method increases model sparsity, improves interpretability over existing prototype-based methods, and narrows the performance gap with the non-interpretable counterpart models.
arXiv Detail & Related papers (2024-09-14T17:52:59Z) - Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS)
We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution.
To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z) - Unicom: Universal and Compact Representation Learning for Image
Retrieval [65.96296089560421]
We cluster the large-scale LAION400M into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model.
To alleviate such conflict, we randomly select partial inter-class prototypes to construct the margin-based softmax loss.
Our method significantly outperforms state-of-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks.
arXiv Detail & Related papers (2023-04-12T14:25:52Z) - Number-Adaptive Prototype Learning for 3D Point Cloud Semantic
Segmentation [46.610620464184926]
We propose to use an adaptive number of prototypes to dynamically describe the different point patterns within a semantic class.
Our method achieves 2.3% mIoU improvement over the baseline model based on the point-wise classification paradigm.
arXiv Detail & Related papers (2022-10-18T15:57:20Z) - Few-Shot Segmentation via Rich Prototype Generation and Recurrent
Prediction Enhancement [12.614578133091168]
We propose a rich prototype generation module (RPGM) and a recurrent prediction enhancement module (RPEM) to reinforce the prototype learning paradigm.
RPGM combines superpixel and K-means clustering to generate rich prototype features with complementary scale relationships.
RPEM utilizes the recurrent mechanism to design a round-way propagation decoder.
arXiv Detail & Related papers (2022-10-03T08:46:52Z) - Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training
of Image Segmentation Models [54.49581189337848]
We propose a method to enable the end-to-end pre-training for image segmentation models based on classification datasets.
The proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse.
Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models.
arXiv Detail & Related papers (2022-07-04T13:02:32Z) - Dual Prototypical Contrastive Learning for Few-shot Semantic
Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task.
The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space.
We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.