Uncovering the Inner Workings of STEGO for Safe Unsupervised Semantic
Segmentation
- URL: http://arxiv.org/abs/2304.07314v1
- Date: Fri, 14 Apr 2023 15:30:26 GMT
- Title: Uncovering the Inner Workings of STEGO for Safe Unsupervised Semantic
Segmentation
- Authors: Alexander Koenig, Maximilian Schambach, Johannes Otterbach
- Abstract summary: Self-supervised pre-training strategies have recently shown impressive results for training general-purpose feature extraction backbones in computer vision.
The DINO self-distillation technique has interesting emerging properties, such as unsupervised clustering in the latent space and semantic correspondences of the produced features without using explicit human-annotated labels.
The STEGO method for unsupervised semantic segmentation contrast distills feature correspondences of a DINO-pre-trained Vision Transformer and recently set a new state of the art.
- Score: 68.8204255655161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised pre-training strategies have recently shown impressive
results for training general-purpose feature extraction backbones in computer
vision. In combination with the Vision Transformer architecture, the DINO
self-distillation technique has interesting emerging properties, such as
unsupervised clustering in the latent space and semantic correspondences of the
produced features without using explicit human-annotated labels. The STEGO
method for unsupervised semantic segmentation contrastively distills feature
correspondences of a DINO-pre-trained Vision Transformer and recently set a new
state of the art. However, the detailed workings of STEGO have yet to be
disentangled, preventing its usage in safety-critical applications. This paper
provides a deeper understanding of the STEGO architecture and training strategy
by conducting studies that uncover the working mechanisms behind STEGO,
reproduce and extend its experimental validation, and investigate the ability
of STEGO to transfer to different datasets. Results demonstrate that the STEGO
architecture can be interpreted as a semantics-preserving dimensionality
reduction technique.
Related papers
- ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - Unsupervised Meta-Learning via In-Context Learning [3.4165401459803335]
We propose a novel approach to unsupervised meta-learning that leverages the generalization abilities of in-supervised learning.
Our method reframes meta-learning as a sequence modeling problem, enabling the transformer encoder to learn task context from support images.
arXiv Detail & Related papers (2024-05-25T08:29:46Z) - EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation [5.476136494434766]
We introduce EiCue, a technique providing semantic and structural cues through an eigenbasis derived from semantic similarity matrix.
We guide our model to learn object-level representations with intra- and inter-image object-feature consistency.
Experiments on COCO-Stuff, Cityscapes, and Potsdam-3 datasets demonstrate the state-of-the-art USS results.
arXiv Detail & Related papers (2024-03-03T11:24:16Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Segment Any Building [8.12405696290333]
This manuscript accentuates the potency of harnessing diversified datasets in tandem with cutting-edge representation learning paradigms for building segmentation in such images.
Our avant-garde joint training regimen underscores the merit of our approach, bearing significant implications in pivotal domains such as urban infrastructural development, disaster mitigation strategies, and ecological surveillance.
The outcomes of this research both fortify the foundations for ensuing scholarly pursuits and presage a horizon replete with innovative applications in the discipline of building segmentation.
arXiv Detail & Related papers (2023-10-02T12:49:20Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Semi-supervised learning made simple with self-supervised clustering [65.98152950607707]
Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations.
We propose a conceptually simple yet empirically powerful approach to turn clustering-based self-supervised methods into semi-supervised learners.
arXiv Detail & Related papers (2023-06-13T01:09:18Z) - OCTAve: 2D en face Optical Coherence Tomography Angiography Vessel
Segmentation in Weakly-Supervised Learning with Locality Augmentation [14.322349196837209]
We propose the application of the scribble-base weakly-supervised learning method to automate the pixel-level annotation.
The proposed method, called OCTAve, combines the weakly-supervised learning using scribble-annotated ground truth augmented with an adversarial and a novel self-supervised deep supervision.
arXiv Detail & Related papers (2022-07-25T14:40:56Z) - Evaluation of Self-taught Learning-based Representations for Facial
Emotion Recognition [62.30451764345482]
This work describes different strategies to generate unsupervised representations obtained through the concept of self-taught learning for facial emotion recognition.
The idea is to create complementary representations promoting diversity by varying the autoencoders' initialization, architecture, and training data.
Experimental results on Jaffe and Cohn-Kanade datasets using a leave-one-subject-out protocol show that FER methods based on the proposed diverse representations compare favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2022-04-26T22:48:15Z) - Self-supervised learning for joint SAR and multispectral land cover
classification [38.8529535887097]
We present a framework and specific tasks for self-supervised training of multichannel models.
We show that the proposed self-supervised approach is highly effective at learning features that correlate with the labels for land cover classification.
arXiv Detail & Related papers (2021-08-20T09:02:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.