Object-Aware Cropping for Self-Supervised Learning
- URL: http://arxiv.org/abs/2112.00319v2
- Date: Thu, 6 Apr 2023 20:05:35 GMT
- Title: Object-Aware Cropping for Self-Supervised Learning
- Authors: Shlok Mishra, Anshul Shah, Ankan Bansal, Abhyuday Jagannatha, Janit
Anjaria, Abhishek Sharma, David Jacobs, Dilip Krishnan
- Abstract summary: We show that self-supervised learning based on the usual random cropping performs poorly on such datasets.
We propose replacing one or both of the random crops with crops obtained from an object proposal algorithm.
Using this approach, which we call object-aware cropping, results in significant improvements over scene cropping on classification and object detection benchmarks.
- Score: 21.79324121283122
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: A core component of the recent success of self-supervised learning is
cropping data augmentation, which selects sub-regions of an image to be used as
positive views in the self-supervised loss. The underlying assumption is that
randomly cropped and resized regions of a given image share information about
the objects of interest, which the learned representation will capture. This
assumption is mostly satisfied in datasets such as ImageNet where there is a
large, centered object, which is highly likely to be present in random crops of
the full image. However, in other datasets such as OpenImages or COCO, which
are more representative of real world uncurated data, there are typically
multiple small objects in an image. In this work, we show that self-supervised
learning based on the usual random cropping performs poorly on such datasets.
We propose replacing one or both of the random crops with crops obtained from
an object proposal algorithm. This encourages the model to learn both object
and scene level semantic representations. Using this approach, which we call
object-aware cropping, results in significant improvements over scene cropping
on classification and object detection benchmarks. For example, on OpenImages,
our approach achieves an improvement of 8.8% mAP over random scene-level
cropping using MoCo-v2 based pre-training. We also show significant
improvements on COCO and PASCAL-VOC object detection and segmentation tasks
over the state-of-the-art self-supervised learning approaches. Our approach is
efficient, simple and general, and can be used in most existing contrastive and
non-contrastive self-supervised learning frameworks.
Related papers
- Object-wise Masked Autoencoders for Fast Pre-training [13.757095663704858]
We show that current masked image encoding models learn the underlying relationship between all objects in the whole scene, instead of a single object representation.
We introduce a novel object selection and division strategy to drop non-object patches for learning object-wise representations by selective reconstruction with interested region masks.
Experiments on four commonly-used datasets demonstrate the effectiveness of our model in reducing the compute cost by 72% while achieving competitive performance.
arXiv Detail & Related papers (2022-05-28T05:13:45Z) - Learning to Detect Every Thing in an Open World [139.78830329914135]
We propose a simple yet surprisingly powerful data augmentation and training scheme we call Learning to Detect Every Thing (LDET)
To avoid suppressing hidden objects, background objects that are visible but unlabeled, we paste annotated objects on a background image sampled from a small region of the original image.
LDET leads to significant improvements on many datasets in the open world instance segmentation task.
arXiv Detail & Related papers (2021-12-03T03:56:06Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - Unsupervised Object-Level Representation Learning from Scene Images [97.07686358706397]
Object-level Representation Learning (ORL) is a new self-supervised learning framework towards scene images.
Our key insight is to leverage image-level self-supervised pre-training as the prior to discover object-level semantic correspondence.
ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks.
arXiv Detail & Related papers (2021-06-22T17:51:24Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.