DatUS^2: Data-driven Unsupervised Semantic Segmentation with Pre-trained
Self-supervised Vision Transformer
- URL: http://arxiv.org/abs/2401.12820v1
- Date: Tue, 23 Jan 2024 14:53:32 GMT
- Title: DatUS^2: Data-driven Unsupervised Semantic Segmentation with Pre-trained
Self-supervised Vision Transformer
- Authors: Sonal Kumar, Arijit Sur and Rashmi Dutta Baruah
- Abstract summary: Unsupervised dense semantic segmentation has not been explored as a downstream task.
This paper proposes a novel data-driven approach for unsupervised semantic segmentation as a downstream task.
Best version of DatUS2 outperforms the existing state-of-the-art method for the unsupervised dense semantic segmentation task.
- Score: 6.898332152137321
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Successive proposals of several self-supervised training schemes continue to
emerge, taking one step closer to developing a universal foundation model. In
this process, the unsupervised downstream tasks are recognized as one of the
evaluation methods to validate the quality of visual features learned with a
self-supervised training scheme. However, unsupervised dense semantic
segmentation has not been explored as a downstream task, which can utilize and
evaluate the quality of semantic information introduced in patch-level feature
representations during self-supervised training of a vision transformer.
Therefore, this paper proposes a novel data-driven approach for unsupervised
semantic segmentation (DatUS^2) as a downstream task. DatUS^2 generates
semantically consistent and dense pseudo annotate segmentation masks for the
unlabeled image dataset without using any visual-prior or synchronized data. We
compare these pseudo-annotated segmentation masks with ground truth masks for
evaluating recent self-supervised training schemes to learn shared semantic
properties at the patch level and discriminative semantic properties at the
segment level. Finally, we evaluate existing state-of-the-art self-supervised
training schemes with our proposed downstream task, i.e., DatUS^2. Also, the
best version of DatUS^2 outperforms the existing state-of-the-art method for
the unsupervised dense semantic segmentation task with 15.02% MiOU and 21.47%
Pixel accuracy on the SUIM dataset. It also achieves a competitive level of
accuracy for a large-scale and complex dataset, i.e., the COCO dataset.
Related papers
- Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals [15.258631373740686]
Unsupervised semantic segmentation aims to automatically partition images into semantically meaningful regions by identifying global semantic categories within an image corpus without any form of annotation.
We present PriMaPs - Principal Mask Proposals - decomposing images into semantically meaningful masks based on their feature representation.
This allows us to realize unsupervised semantic segmentation by fitting class prototypes to PriMaPs with a expectation-maximization algorithm, PriMaPs-EM.
arXiv Detail & Related papers (2024-04-25T17:58:09Z) - A Lightweight Clustering Framework for Unsupervised Semantic
Segmentation [28.907274978550493]
Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data.
We propose a lightweight clustering framework for unsupervised semantic segmentation.
Our framework achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2023-11-30T15:33:42Z) - INoD: Injected Noise Discriminator for Self-Supervised Representation
Learning in Agricultural Fields [6.891600948991265]
We propose an Injected Noise Discriminator (INoD) which exploits principles of feature replacement and dataset discrimination for self-supervised representation learning.
INoD interleaves feature maps from two disjoint datasets during their convolutional encoding and predicts the dataset affiliation of the resultant feature map as a pretext task.
Our approach enables the network to learn unequivocal representations of objects seen in one dataset while observing them in conjunction with similar features from the disjoint dataset.
arXiv Detail & Related papers (2023-03-31T14:46:31Z) - Semantic Segmentation with Active Semi-Supervised Representation
Learning [23.79742108127707]
We train an effective semantic segmentation algorithm with significantly lesser labeled data.
We extend the prior state-of-the-art S4AL algorithm by replacing its mean teacher approach for semi-supervised learning with a self-training approach.
We evaluate our method on CamVid and CityScapes datasets, the de-facto standards for active learning for semantic segmentation.
arXiv Detail & Related papers (2022-10-16T00:21:43Z) - Adversarial Dual-Student with Differentiable Spatial Warping for
Semi-Supervised Semantic Segmentation [70.2166826794421]
We propose a differentiable geometric warping to conduct unsupervised data augmentation.
We also propose a novel adversarial dual-student framework to improve the Mean-Teacher.
Our solution significantly improves the performance and state-of-the-art results are achieved on both datasets.
arXiv Detail & Related papers (2022-03-05T17:36:17Z) - Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with
Self-Supervised Depth Estimation [94.16816278191477]
We present a framework for semi-adaptive and domain-supervised semantic segmentation.
It is enhanced by self-supervised monocular depth estimation trained only on unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset.
arXiv Detail & Related papers (2021-08-28T01:33:38Z) - Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals [78.12377360145078]
We introduce a novel two-step framework that adopts a predetermined prior in a contrastive optimization objective to learn pixel embeddings.
This marks a large deviation from existing works that relied on proxy tasks or end-to-end clustering.
In particular, when fine-tuning the learned representations using just 1% of labeled examples on PASCAL, we outperform supervised ImageNet pre-training by 7.1% mIoU.
arXiv Detail & Related papers (2021-02-11T18:54:47Z) - Three Ways to Improve Semantic Segmentation with Self-Supervised Depth
Estimation [90.87105131054419]
We present a framework for semi-supervised semantic segmentation, which is enhanced by self-supervised monocular depth estimation from unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset, where all three modules demonstrate significant performance gains.
arXiv Detail & Related papers (2020-12-19T21:18:03Z) - Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive
Person Re-Identification [64.37745443119942]
This paper jointly enforces visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification.
Experimental results on three large-scale ReID datasets demonstrate the superiority of proposed method in both unsupervised and unsupervised domain adaptive ReID tasks.
arXiv Detail & Related papers (2020-07-21T14:31:27Z) - Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm.
We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data.
Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.