Unsupervised Semantic Segmentation by Distilling Feature Correspondences
- URL: http://arxiv.org/abs/2203.08414v1
- Date: Wed, 16 Mar 2022 06:08:47 GMT
- Title: Unsupervised Semantic Segmentation by Distilling Feature Correspondences
- Authors: Mark Hamilton, Zhoutong Zhang, Bharath Hariharan, Noah Snavely,
William T. Freeman
- Abstract summary: Unsupervised semantic segmentation aims to discover and localize semantically meaningful categories within image corpora without any form of annotation.
We present STEGO, a novel framework that distills unsupervised features into high-quality discrete semantic labels.
STEGO yields a significant improvement over the prior state of the art, on both the CocoStuff and Cityscapes challenges.
- Score: 94.73675308961944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised semantic segmentation aims to discover and localize semantically
meaningful categories within image corpora without any form of annotation. To
solve this task, algorithms must produce features for every pixel that are both
semantically meaningful and compact enough to form distinct clusters. Unlike
previous works which achieve this with a single end-to-end framework, we
propose to separate feature learning from cluster compactification.
Empirically, we show that current unsupervised feature learning frameworks
already generate dense features whose correlations are semantically consistent.
This observation motivates us to design STEGO ($\textbf{S}$elf-supervised
$\textbf{T}$ransformer with $\textbf{E}$nergy-based $\textbf{G}$raph
$\textbf{O}$ptimization), a novel framework that distills unsupervised features
into high-quality discrete semantic labels. At the core of STEGO is a novel
contrastive loss function that encourages features to form compact clusters
while preserving their relationships across the corpora. STEGO yields a
significant improvement over the prior state of the art, on both the CocoStuff
($\textbf{+14 mIoU}$) and Cityscapes ($\textbf{+9 mIoU}$) semantic segmentation
challenges.
Related papers
- HDC: Hierarchical Semantic Decoding with Counting Assistance for Generalized Referring Expression Segmentation [33.40691116355158]
Generalized Referring Expression (GRES) amplifies the formulation of classic RES by involving multiple/non-target scenarios.
We propose a $textbfH$ierarchical Semantic $textbfD$ecoding with $textbfC$ounting Assistance framework (HDC)
We endow HDC with explicit counting capability to facilitate comprehensive object perception in multiple/single/non-target settings.
arXiv Detail & Related papers (2024-05-24T15:53:59Z) - Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer [57.37893387775829]
We introduce a fast and balanced clustering method, named textbfSemantic textbfEquitable textbfClustering (SEC)
SEC clusters tokens based on their global semantic relevance in an efficient, straightforward manner.
We propose a versatile vision backbone, SecViT, which attains an impressive textbf84.2% image classification accuracy with only textbf27M parameters and textbf4.4G FLOPs.
arXiv Detail & Related papers (2024-05-22T04:49:00Z) - Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - Unsupervised Universal Image Segmentation [59.0383635597103]
We propose an Unsupervised Universal model (U2Seg) adept at performing various image segmentation tasks.
U2Seg generates pseudo semantic labels for these segmentation tasks via leveraging self-supervised models.
We then self-train the model on these pseudo semantic labels, yielding substantial performance gains.
arXiv Detail & Related papers (2023-12-28T18:59:04Z) - A Lightweight Clustering Framework for Unsupervised Semantic
Segmentation [28.907274978550493]
Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data.
We propose a lightweight clustering framework for unsupervised semantic segmentation.
Our framework achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2023-11-30T15:33:42Z) - SmooSeg: Smoothness Prior for Unsupervised Semantic Segmentation [27.367986520072147]
Unsupervised semantic segmentation is a challenging task that segments images into semantic groups without manual annotation.
We propose a novel approach called SmooSeg that harnesses self-supervised learning methods to model the closeness relationships among observations as smoothness signals.
Our SmooSeg significantly outperforms STEGO in terms of pixel accuracy on three datasets.
arXiv Detail & Related papers (2023-10-27T03:29:25Z) - Fully Self-Supervised Learning for Semantic Segmentation [46.6602159197283]
We present a fully self-supervised framework for semantic segmentation(FS4).
We propose a bootstrapped training scheme for semantic segmentation, which fully leveraged the global semantic knowledge for self-supervision.
We evaluate our method on the large-scale COCO-Stuff dataset and achieved 7.19 mIoU improvements on both things and stuff objects.
arXiv Detail & Related papers (2022-02-24T09:38:22Z) - Affinity Attention Graph Neural Network for Weakly Supervised Semantic
Segmentation [86.44301443789763]
We propose Affinity Attention Graph Neural Network ($A2$GNN) for weakly supervised semantic segmentation.
Our approach achieves new state-of-the-art performances on Pascal VOC 2012 datasets.
arXiv Detail & Related papers (2021-06-08T02:19:21Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.