Self-Supervised Visual Representation Learning with Semantic Grouping
- URL: http://arxiv.org/abs/2205.15288v1
- Date: Mon, 30 May 2022 17:50:59 GMT
- Title: Self-Supervised Visual Representation Learning with Semantic Grouping
- Authors: Xin Wen, Bingchen Zhao, Anlin Zheng, Xiangyu Zhang, Xiaojuan Qi
- Abstract summary: We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
- Score: 50.14703605659837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we tackle the problem of learning visual representations from
unlabeled scene-centric data. Existing works have demonstrated the potential of
utilizing the underlying complex structure within scene-centric data; still,
they commonly rely on hand-crafted objectness priors or specialized pretext
tasks to build a learning framework, which may harm generalizability. Instead,
we propose contrastive learning from data-driven semantic slots, namely
SlotCon, for joint semantic grouping and representation learning. The semantic
grouping is performed by assigning pixels to a set of learnable prototypes,
which can adapt to each sample by attentive pooling over the feature and form
new slots. Based on the learned data-dependent slots, a contrastive objective
is employed for representation learning, which enhances the discriminability of
features, and conversely facilitates grouping semantically coherent pixels
together. Compared with previous efforts, by simultaneously optimizing the two
coupled objectives of semantic grouping and contrastive learning, our approach
bypasses the disadvantages of hand-crafted priors and is able to learn
object/group-level representations from scene-centric images. Experiments show
our approach effectively decomposes complex scenes into semantic groups for
feature learning and significantly benefits downstream tasks, including object
detection, instance segmentation, and semantic segmentation. The code will be
made publicly available.
Related papers
- GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding [66.5538429726564]
Self-supervised 3D representation learning aims to learn effective representations from large-scale unlabeled point clouds.
We propose GroupContrast, a novel approach that combines segment grouping and semantic-aware contrastive learning.
arXiv Detail & Related papers (2024-03-14T17:59:59Z) - Perceptual Grouping in Contrastive Vision-Language Models [59.1542019031645]
We show how vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
We propose a minimal set of modifications that results in models that uniquely learn both semantic and spatial information.
arXiv Detail & Related papers (2022-10-18T17:01:35Z) - Matching Multiple Perspectives for Efficient Representation Learning [0.0]
We present an approach that combines self-supervised learning with a multi-perspective matching technique.
We show that the availability of multiple views of the same object combined with a variety of self-supervised pretraining algorithms can lead to improved object classification performance.
arXiv Detail & Related papers (2022-08-16T10:33:13Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z) - Self-Supervised Learning of Object Parts for Semantic Segmentation [7.99536002595393]
We argue that self-supervised learning of object parts is a solution to this issue.
Our method surpasses the state-of-the-art on three semantic segmentation benchmarks by 17%-3%.
arXiv Detail & Related papers (2022-04-27T17:55:17Z) - Object Pursuit: Building a Space of Objects via Discriminative Weight
Generation [23.85039747700698]
We propose a framework to continuously learn object-centric representations for visual learning and understanding.
We leverage interactions to sample diverse variations of an object and the corresponding training signals while learning the object-centric representations.
We perform an extensive study of the key features of the proposed framework and analyze the characteristics of the learned representations.
arXiv Detail & Related papers (2021-12-15T08:25:30Z) - Quantifying Learnability and Describability of Visual Concepts Emerging
in Representation Learning [91.58529629419135]
We consider how to characterise visual groupings discovered automatically by deep neural networks.
We introduce two concepts, visual learnability and describability, that can be used to quantify the interpretability of arbitrary image groupings.
arXiv Detail & Related papers (2020-10-27T18:41:49Z) - Unsupervised Image Classification for Deep Representation Learning [42.09716669386924]
We propose an unsupervised image classification framework without using embedding clustering.
Experiments on ImageNet dataset have been conducted to prove the effectiveness of our method.
arXiv Detail & Related papers (2020-06-20T02:57:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.