TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic
Segmentation
- URL: http://arxiv.org/abs/2112.01515v1
- Date: Thu, 2 Dec 2021 18:59:03 GMT
- Title: TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic
Segmentation
- Authors: Zhaoyuan Yin, Pichao Wang, Fan Wang, Xianzhe Xu, Hanling Zhang, Hao
Li, Rong Jin
- Abstract summary: Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations.
We propose the first top-down unsupervised semantic segmentation framework for fine-grained segmentation in extremely complicated scenarios.
Our results show that our top-down unsupervised segmentation is robust to both object-centric and scene-centric datasets.
- Score: 44.75300205362518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised semantic segmentation aims to obtain high-level semantic
representation on low-level visual features without manual annotations. Most
existing methods are bottom-up approaches that try to group pixels into regions
based on their visual cues or certain predefined rules. As a result, it is
difficult for these bottom-up approaches to generate fine-grained semantic
segmentation when coming to complicated scenes with multiple objects and some
objects sharing similar visual appearance. In contrast, we propose the first
top-down unsupervised semantic segmentation framework for fine-grained
segmentation in extremely complicated scenarios. Specifically, we first obtain
rich high-level structured semantic concept information from large-scale vision
data in a self-supervised learning manner, and use such information as a prior
to discover potential semantic categories presented in target datasets.
Secondly, the discovered high-level semantic categories are mapped to low-level
pixel features by calculating the class activate map (CAM) with respect to
certain discovered semantic representation. Lastly, the obtained CAMs serve as
pseudo labels to train the segmentation module and produce final semantic
segmentation. Experimental results on multiple semantic segmentation benchmarks
show that our top-down unsupervised segmentation is robust to both
object-centric and scene-centric datasets under different semantic granularity
levels, and outperforms all the current state-of-the-art bottom-up methods. Our
code is available at \url{https://github.com/damo-cv/TransFGU}.
Related papers
- SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images [17.98848062686217]
We introduce the first hierarchical semantic segmentation dataset with subpart annotations for natural images.
We also introduce two novel evaluation metrics to evaluate how well algorithms capture spatial and semantic relationships across hierarchical levels.
arXiv Detail & Related papers (2024-07-12T21:08:00Z) - A Lightweight Clustering Framework for Unsupervised Semantic
Segmentation [28.907274978550493]
Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data.
We propose a lightweight clustering framework for unsupervised semantic segmentation.
Our framework achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2023-11-30T15:33:42Z) - Learning Semantic Segmentation with Query Points Supervision on Aerial Images [57.09251327650334]
We present a weakly supervised learning algorithm to train semantic segmentation algorithms.
Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation.
arXiv Detail & Related papers (2023-09-11T14:32:04Z) - Hierarchical Open-vocabulary Universal Image Segmentation [48.008887320870244]
Open-vocabulary image segmentation aims to partition an image into semantic regions according to arbitrary text descriptions.
We propose a decoupled text-image fusion mechanism and representation learning modules for both "things" and "stuff"
Our resulting model, named HIPIE tackles, HIerarchical, oPen-vocabulary, and unIvErsal segmentation tasks within a unified framework.
arXiv Detail & Related papers (2023-07-03T06:02:15Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings [81.09026586111811]
We propose an approach to semantic segmentation that achieves state-of-the-art supervised performance when applied in a zero-shot setting.
This is achieved by replacing each class label with a vector-valued embedding of a short paragraph that describes the class.
The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets.
arXiv Detail & Related papers (2022-02-04T07:19:09Z) - Three Ways to Improve Semantic Segmentation with Self-Supervised Depth
Estimation [90.87105131054419]
We present a framework for semi-supervised semantic segmentation, which is enhanced by self-supervised monocular depth estimation from unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset, where all three modules demonstrate significant performance gains.
arXiv Detail & Related papers (2020-12-19T21:18:03Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.