Unsupervised Object-Level Representation Learning from Scene Images
- URL: http://arxiv.org/abs/2106.11952v1
- Date: Tue, 22 Jun 2021 17:51:24 GMT
- Title: Unsupervised Object-Level Representation Learning from Scene Images
- Authors: Jiahao Xie, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy
- Abstract summary: Object-level Representation Learning (ORL) is a new self-supervised learning framework towards scene images.
Our key insight is to leverage image-level self-supervised pre-training as the prior to discover object-level semantic correspondence.
ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks.
- Score: 97.07686358706397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive self-supervised learning has largely narrowed the gap to
supervised pre-training on ImageNet. However, its success highly relies on the
object-centric priors of ImageNet, i.e., different augmented views of the same
image correspond to the same object. Such a heavily curated constraint becomes
immediately infeasible when pre-trained on more complex scene images with many
objects. To overcome this limitation, we introduce Object-level Representation
Learning (ORL), a new self-supervised learning framework towards scene images.
Our key insight is to leverage image-level self-supervised pre-training as the
prior to discover object-level semantic correspondence, thus realizing
object-level representation learning from scene images. Extensive experiments
on COCO show that ORL significantly improves the performance of self-supervised
learning on scene images, even surpassing supervised ImageNet pre-training on
several downstream tasks. Furthermore, ORL improves the downstream performance
when more unlabeled scene images are available, demonstrating its great
potential of harnessing unlabeled data in the wild. We hope our approach can
motivate future research on more general-purpose unsupervised representation
learning from scene data. Project page: https://www.mmlab-ntu.com/project/orl/.
Related papers
- Saliency Guided Contrastive Learning on Scene Images [71.07412958621052]
We leverage the saliency map derived from the model's output during learning to highlight discriminative regions and guide the whole contrastive learning.
Our method significantly improves the performance of self-supervised learning on scene images by +1.1, +4.3, +2.2 Top1 accuracy in ImageNet linear evaluation, Semi-supervised learning with 1% and 10% ImageNet labels, respectively.
arXiv Detail & Related papers (2023-02-22T15:54:07Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - UniVIP: A Unified Framework for Self-Supervised Visual Pre-training [50.87603616476038]
We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset.
Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance.
Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
arXiv Detail & Related papers (2022-03-14T10:04:04Z) - Object-Aware Cropping for Self-Supervised Learning [21.79324121283122]
We show that self-supervised learning based on the usual random cropping performs poorly on such datasets.
We propose replacing one or both of the random crops with crops obtained from an object proposal algorithm.
Using this approach, which we call object-aware cropping, results in significant improvements over scene cropping on classification and object detection benchmarks.
arXiv Detail & Related papers (2021-12-01T07:23:37Z) - Contrastive Object-level Pre-training with Spatial Noise Curriculum
Learning [12.697842097171119]
We present a curriculum learning mechanism that adaptively augments the generated regions, which allows the model to consistently acquire a useful learning signal.
Our experiments show that our approach improves on the MoCo v2 baseline by a large margin on multiple object-level tasks when pre-training on multi-object scene image datasets.
arXiv Detail & Related papers (2021-11-26T18:29:57Z) - When Does Contrastive Visual Representation Learning Work? [13.247759411409936]
We study contrastive self-supervised learning on four diverse large-scale datasets.
Our key findings include: (i) the benefit of additional pretraining data beyond 500k images is modest, (ii) adding pretraining images from another domain does not lead to more general representations, and (iii) corrupted pretraining images have a disparate impact on supervised and self-supervised pretraining.
arXiv Detail & Related papers (2021-05-12T17:52:42Z) - Self-Supervised Learning of Remote Sensing Scene Representations Using
Contrastive Multiview Coding [0.0]
We conduct an analysis of the applicability of self-supervised learning in remote sensing image classification.
We show that, for the downstream task of remote sensing image classification, using self-supervised pre-training can give better results than using supervised pre-training on images of natural scenes.
arXiv Detail & Related papers (2021-04-14T18:25:43Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - Self-Supervised Viewpoint Learning From Image Collections [116.56304441362994]
We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner.
We show that our approach performs competitively to fully-supervised approaches for several object categories like human faces, cars, buses, and trains.
arXiv Detail & Related papers (2020-04-03T22:01:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.