Discovering Object Masks with Transformers for Unsupervised Semantic
Segmentation
- URL: http://arxiv.org/abs/2206.06363v1
- Date: Mon, 13 Jun 2022 17:59:43 GMT
- Title: Discovering Object Masks with Transformers for Unsupervised Semantic
Segmentation
- Authors: Wouter Van Gansbeke, Simon Vandenhende, Luc Van Gool
- Abstract summary: MaskDistill is a novel framework for unsupervised semantic segmentation.
Our framework does not latch onto low-level image cues and is not limited to object-centric datasets.
- Score: 75.00151934315967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of unsupervised semantic segmentation aims to cluster pixels into
semantically meaningful groups. Specifically, pixels assigned to the same
cluster should share high-level semantic properties like their object or part
category. This paper presents MaskDistill: a novel framework for unsupervised
semantic segmentation based on three key ideas. First, we advocate a
data-driven strategy to generate object masks that serve as a pixel grouping
prior for semantic segmentation. This approach omits handcrafted priors, which
are often designed for specific scene compositions and limit the applicability
of competing frameworks. Second, MaskDistill clusters the object masks to
obtain pseudo-ground-truth for training an initial object segmentation model.
Third, we leverage this model to filter out low-quality object masks. This
strategy mitigates the noise in our pixel grouping prior and results in a clean
collection of masks which we use to train a final segmentation model. By
combining these components, we can considerably outperform previous works for
unsupervised semantic segmentation on PASCAL (+11% mIoU) and COCO (+4% mask
AP50). Interestingly, as opposed to existing approaches, our framework does not
latch onto low-level image cues and is not limited to object-centric datasets.
The code and models will be made available.
Related papers
- Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals [15.258631373740686]
Unsupervised semantic segmentation aims to automatically partition images into semantically meaningful regions by identifying global semantic categories within an image corpus without any form of annotation.
We present PriMaPs - Principal Mask Proposals - decomposing images into semantically meaningful masks based on their feature representation.
This allows us to realize unsupervised semantic segmentation by fitting class prototypes to PriMaPs with a expectation-maximization algorithm, PriMaPs-EM.
arXiv Detail & Related papers (2024-04-25T17:58:09Z) - Learning Open-vocabulary Semantic Segmentation Models From Natural
Language Supervision [49.905448429974804]
We consider the problem of open-vocabulary semantic segmentation (OVS), which aims to segment objects of arbitrary classes instead of pre-defined, closed-set categories.
We propose a transformer-based model for OVS, termed as OVSegmentor, which exploits web-crawled image-text pairs for pre-training.
Our model achieves superior segmentation results over the state-of-the-art method by using only 3% data (4M vs 134M) for pre-training.
arXiv Detail & Related papers (2023-01-22T13:10:05Z) - Few-shot semantic segmentation via mask aggregation [5.886986014593717]
Few-shot semantic segmentation aims to recognize novel classes with only very few labelled data.
Previous works have typically regarded it as a pixel-wise classification problem.
We introduce a mask-based classification method for addressing this problem.
arXiv Detail & Related papers (2022-02-15T07:13:09Z) - Scaling up instance annotation via label propagation [69.8001043244044]
We propose a highly efficient annotation scheme for building large datasets with object segmentation masks.
We exploit these similarities by using hierarchical clustering on mask predictions made by a segmentation model.
We show that we obtain 1M object segmentation masks with a total annotation time of only 290 hours.
arXiv Detail & Related papers (2021-10-05T18:29:34Z) - Per-Pixel Classification is Not All You Need for Semantic Segmentation [184.2905747595058]
Mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks.
We propose MaskFormer, a simple mask classification model which predicts a set of binary masks.
Our method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
arXiv Detail & Related papers (2021-07-13T17:59:50Z) - SOLO: A Simple Framework for Instance Segmentation [84.00519148562606]
"instance categories" assigns categories to each pixel within an instance according to the instance's location.
"SOLO" is a simple, direct, and fast framework for instance segmentation with strong performance.
Our approach achieves state-of-the-art results for instance segmentation in terms of both speed and accuracy.
arXiv Detail & Related papers (2021-06-30T09:56:54Z) - Locate then Segment: A Strong Pipeline for Referring Image Segmentation [73.19139431806853]
Referring image segmentation aims to segment the objects referred by a natural language expression.
Previous methods usually focus on designing an implicit and recurrent interaction mechanism to fuse the visual-linguistic features to directly generate the final segmentation mask.
We present a "Then-Then-Segment" scheme to tackle these problems.
Our framework is simple but surprisingly effective.
arXiv Detail & Related papers (2021-03-30T12:25:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.