Unsupervised Learning of Dense Visual Representations
- URL: http://arxiv.org/abs/2011.05499v2
- Date: Mon, 7 Dec 2020 20:16:40 GMT
- Title: Unsupervised Learning of Dense Visual Representations
- Authors: Pedro O. Pinheiro, Amjad Almahairi, Ryan Y. Benmalek, Florian Golemo,
Aaron Courville
- Abstract summary: We propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations.
VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions.
Our method outperforms ImageNet supervised pretraining in multiple dense prediction tasks.
- Score: 14.329781842154281
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive self-supervised learning has emerged as a promising approach to
unsupervised visual representation learning. In general, these methods learn
global (image-level) representations that are invariant to different views
(i.e., compositions of data augmentation) of the same image. However, many
visual understanding tasks require dense (pixel-level) representations. In this
paper, we propose View-Agnostic Dense Representation (VADeR) for unsupervised
learning of dense representations. VADeR learns pixelwise representations by
forcing local features to remain constant over different viewing conditions.
Specifically, this is achieved through pixel-level contrastive learning:
matching features (that is, features that describes the same location of the
scene on different views) should be close in an embedding space, while
non-matching features should be apart. VADeR provides a natural representation
for dense prediction tasks and transfers well to downstream tasks. Our method
outperforms ImageNet supervised pretraining (and strong unsupervised baselines)
in multiple dense prediction tasks.
Related papers
- MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Self-supervised Contrastive Learning for Cross-domain Hyperspectral
Image Representation [26.610588734000316]
This paper introduces a self-supervised learning framework suitable for hyperspectral images that are inherently challenging to annotate.
The proposed framework architecture leverages cross-domain CNN, allowing for learning representations from different hyperspectral images.
The experimental results demonstrate the advantage of the proposed self-supervised representation over models trained from scratch or other transfer learning methods.
arXiv Detail & Related papers (2022-02-08T16:16:45Z) - Dense Semantic Contrast for Self-Supervised Visual Representation
Learning [12.636783522731392]
We present Dense Semantic Contrast (DSC) for modeling semantic category decision boundaries at a dense level.
We propose a dense cross-image semantic contrastive learning framework for multi-granularity representation learning.
Experimental results show that our DSC model outperforms state-of-the-art methods when transferring to downstream dense prediction tasks.
arXiv Detail & Related papers (2021-09-16T07:04:05Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Exploring Cross-Image Pixel Contrast for Semantic Segmentation [130.22216825377618]
We propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes.
Our method can be effortlessly incorporated into existing segmentation frameworks without extra overhead during testing.
arXiv Detail & Related papers (2021-01-28T11:35:32Z) - Seed the Views: Hierarchical Semantic Alignment for Contrastive
Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation.
Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.