Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals
- URL: http://arxiv.org/abs/2102.06191v1
- Date: Thu, 11 Feb 2021 18:54:47 GMT
- Title: Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals
- Authors: Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Luc Van
Gool
- Abstract summary: We introduce a novel two-step framework that adopts a predetermined prior in a contrastive optimization objective to learn pixel embeddings.
This marks a large deviation from existing works that relied on proxy tasks or end-to-end clustering.
In particular, when fine-tuning the learned representations using just 1% of labeled examples on PASCAL, we outperform supervised ImageNet pre-training by 7.1% mIoU.
- Score: 78.12377360145078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Being able to learn dense semantic representations of images without
supervision is an important problem in computer vision. However, despite its
significance, this problem remains rather unexplored, with a few exceptions
that considered unsupervised semantic segmentation on small-scale datasets with
a narrow visual domain. In this paper, we make a first attempt to tackle the
problem on datasets that have been traditionally utilized for the supervised
case. To achieve this, we introduce a novel two-step framework that adopts a
predetermined prior in a contrastive optimization objective to learn pixel
embeddings. This marks a large deviation from existing works that relied on
proxy tasks or end-to-end clustering. Additionally, we argue about the
importance of having a prior that contains information about objects, or their
parts, and discuss several possibilities to obtain such a prior in an
unsupervised manner.
Extensive experimental evaluation shows that the proposed method comes with
key advantages over existing works. First, the learned pixel embeddings can be
directly clustered in semantic groups using K-Means. Second, the method can
serve as an effective unsupervised pre-training for the semantic segmentation
task. In particular, when fine-tuning the learned representations using just 1%
of labeled examples on PASCAL, we outperform supervised ImageNet pre-training
by 7.1% mIoU. The code is available at
https://github.com/wvangansbeke/Unsupervised-Semantic-Segmentation.
Related papers
- CrOC: Cross-View Online Clustering for Dense Visual Representation
Learning [39.12950211289954]
We propose a Cross-view consistency objective with an Online Clustering mechanism (CrOC) to discover and segment the semantics of the views.
In the absence of hand-crafted priors, the resulting method is more generalizable and does not require a cumbersome pre-processing step.
We demonstrate excellent performance on linear and unsupervised segmentation transfer tasks on various datasets.
arXiv Detail & Related papers (2023-03-23T13:24:16Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic
Segmentation [44.75300205362518]
Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations.
We propose the first top-down unsupervised semantic segmentation framework for fine-grained segmentation in extremely complicated scenarios.
Our results show that our top-down unsupervised segmentation is robust to both object-centric and scene-centric datasets.
arXiv Detail & Related papers (2021-12-02T18:59:03Z) - Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised
Semantic Segmentation [88.49669148290306]
We propose a novel weakly supervised multi-task framework called AuxSegNet to leverage saliency detection and multi-label image classification as auxiliary tasks.
Inspired by their similar structured semantics, we also propose to learn a cross-task global pixel-level affinity map from the saliency and segmentation representations.
The learned cross-task affinity can be used to refine saliency predictions and propagate CAM maps to provide improved pseudo labels for both tasks.
arXiv Detail & Related papers (2021-07-25T11:39:58Z) - Unsupervised Image Segmentation by Mutual Information Maximization and
Adversarial Regularization [7.165364364478119]
We propose a novel fully unsupervised semantic segmentation method, the so-called Information Maximization and Adrial Regularization (InMARS)
Inspired by human perception which parses a scene into perceptual groups, our proposed approach first partitions an input image into meaningful regions (also known as superpixels)
Next, it utilizes Mutual-Information-Maximization followed by an adversarial training strategy to cluster these regions into semantically meaningful classes.
Our experiments demonstrate that our method achieves the state-of-the-art performance on two commonly used unsupervised semantic segmentation datasets.
arXiv Detail & Related papers (2021-07-01T18:36:27Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - Find it if You Can: End-to-End Adversarial Erasing for Weakly-Supervised
Semantic Segmentation [6.326017213490535]
We propose a novel formulation of adversarial erasing of the attention maps.
The proposed solution does not require saliency masks, instead it uses a regularization loss to prevent the attention maps from spreading to less discriminative object regions.
Our experiments on the Pascal VOC dataset demonstrate that our adversarial approach increases segmentation performance by 2.1 mIoU compared to our baseline and by 1.0 mIoU compared to previous adversarial erasing approaches.
arXiv Detail & Related papers (2020-11-09T18:35:35Z) - Unsupervised Part Discovery by Unsupervised Disentanglement [10.664434993386525]
Part segmentations provide information about part localizations on the level of individual pixels.
Large annotation costs limit the scalability of supervised algorithms to other object categories.
Our work demonstrates the feasibility to discover semantic part segmentations without supervision.
arXiv Detail & Related papers (2020-09-09T12:34:37Z) - RGB-based Semantic Segmentation Using Self-Supervised Depth Pre-Training [77.62171090230986]
We propose an easily scalable and self-supervised technique that can be used to pre-train any semantic RGB segmentation method.
In particular, our pre-training approach makes use of automatically generated labels that can be obtained using depth sensors.
We show how our proposed self-supervised pre-training with HN-labels can be used to replace ImageNet pre-training.
arXiv Detail & Related papers (2020-02-06T11:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.