DatUS^2: Data-driven Unsupervised Semantic Segmentation with Pre-trained
Self-supervised Vision Transformer
- URL: http://arxiv.org/abs/2401.12820v1
- Date: Tue, 23 Jan 2024 14:53:32 GMT
- Title: DatUS^2: Data-driven Unsupervised Semantic Segmentation with Pre-trained
Self-supervised Vision Transformer
- Authors: Sonal Kumar, Arijit Sur and Rashmi Dutta Baruah
- Abstract summary: Unsupervised dense semantic segmentation has not been explored as a downstream task.
This paper proposes a novel data-driven approach for unsupervised semantic segmentation as a downstream task.
Best version of DatUS2 outperforms the existing state-of-the-art method for the unsupervised dense semantic segmentation task.
- Score: 6.898332152137321
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Successive proposals of several self-supervised training schemes continue to
emerge, taking one step closer to developing a universal foundation model. In
this process, the unsupervised downstream tasks are recognized as one of the
evaluation methods to validate the quality of visual features learned with a
self-supervised training scheme. However, unsupervised dense semantic
segmentation has not been explored as a downstream task, which can utilize and
evaluate the quality of semantic information introduced in patch-level feature
representations during self-supervised training of a vision transformer.
Therefore, this paper proposes a novel data-driven approach for unsupervised
semantic segmentation (DatUS^2) as a downstream task. DatUS^2 generates
semantically consistent and dense pseudo annotate segmentation masks for the
unlabeled image dataset without using any visual-prior or synchronized data. We
compare these pseudo-annotated segmentation masks with ground truth masks for
evaluating recent self-supervised training schemes to learn shared semantic
properties at the patch level and discriminative semantic properties at the
segment level. Finally, we evaluate existing state-of-the-art self-supervised
training schemes with our proposed downstream task, i.e., DatUS^2. Also, the
best version of DatUS^2 outperforms the existing state-of-the-art method for
the unsupervised dense semantic segmentation task with 15.02% MiOU and 21.47%
Pixel accuracy on the SUIM dataset. It also achieves a competitive level of
accuracy for a large-scale and complex dataset, i.e., the COCO dataset.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.