Self-Supervised Visual Representation Learning from Hierarchical
Grouping
- URL: http://arxiv.org/abs/2012.03044v1
- Date: Sat, 5 Dec 2020 14:54:08 GMT
- Title: Self-Supervised Visual Representation Learning from Hierarchical
Grouping
- Authors: Xiao Zhang, Michael Maire
- Abstract summary: We bootstrapping visual representation learning from a primitive visual grouping capability.
A small supervised dataset suffices for training this grouping primitive.
Across a large unlabeled dataset, we apply this learned primitive to automatically predict hierarchical region structure.
- Score: 29.51194352981303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We create a framework for bootstrapping visual representation learning from a
primitive visual grouping capability. We operationalize grouping via a contour
detector that partitions an image into regions, followed by merging of those
regions into a tree hierarchy. A small supervised dataset suffices for training
this grouping primitive. Across a large unlabeled dataset, we apply this
learned primitive to automatically predict hierarchical region structure. These
predictions serve as guidance for self-supervised contrastive feature learning:
we task a deep network with producing per-pixel embeddings whose pairwise
distances respect the region hierarchy. Experiments demonstrate that our
approach can serve as state-of-the-art generic pre-training, benefiting
downstream tasks. We additionally explore applications to semantic region
search and video-based object instance tracking.
Related papers
- Mapping High-level Semantic Regions in Indoor Environments without
Object Recognition [50.624970503498226]
The present work proposes a method for semantic region mapping via embodied navigation in indoor environments.
To enable region identification, the method uses a vision-to-language model to provide scene information for mapping.
By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location.
arXiv Detail & Related papers (2024-03-11T18:09:50Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic
Filter Attention [7.237370981736913]
We propose a framework to teach any existing convolutional neural network to generate text descriptions about its own latent representations at the filter level.
We show that our method can generate novel descriptions for learned filters beyond the set of categories defined in the training dataset.
We also demonstrate a novel application of our method for unsupervised dataset bias analysis.
arXiv Detail & Related papers (2022-04-10T04:57:56Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing
Data [70.64030011999981]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Exploiting the relationship between visual and textual features in
social networks for image classification with zero-shot deep learning [0.0]
In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture.
Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part.
Considering the associated texts to the images can help to improve the accuracy depending on the goal.
arXiv Detail & Related papers (2021-07-08T10:54:59Z) - Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera.
We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network.
We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z) - Unsupervised Image Classification for Deep Representation Learning [42.09716669386924]
We propose an unsupervised image classification framework without using embedding clustering.
Experiments on ImageNet dataset have been conducted to prove the effectiveness of our method.
arXiv Detail & Related papers (2020-06-20T02:57:06Z) - Semantically-Guided Representation Learning for Self-Supervised
Monocular Depth [40.49380547487908]
We propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning.
Our method improves upon the state of the art for self-supervised monocular depth prediction over all pixels, fine-grained details, and per semantic categories.
arXiv Detail & Related papers (2020-02-27T18:40:10Z) - Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning [86.45526827323954]
Weakly-supervised semantic segmentation is a challenging task as no pixel-wise label information is provided for training.
We propose an iterative algorithm to learn such pairwise relations.
We show that the proposed algorithm performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2020-02-19T10:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.