Dense Contrastive Learning for Self-Supervised Visual Pre-Training
- URL: http://arxiv.org/abs/2011.09157v2
- Date: Sun, 4 Apr 2021 11:41:26 GMT
- Title: Dense Contrastive Learning for Self-Supervised Visual Pre-Training
- Authors: Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei Li
- Abstract summary: We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
- Score: 102.15325936477362
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: To date, most existing self-supervised learning methods are designed and
optimized for image classification. These pre-trained models can be sub-optimal
for dense prediction tasks due to the discrepancy between image-level
prediction and pixel-level prediction. To fill this gap, we aim to design an
effective, dense self-supervised learning method that directly works at the
level of pixels (or local features) by taking into account the correspondence
between local features. We present dense contrastive learning, which implements
self-supervised learning by optimizing a pairwise contrastive (dis)similarity
loss at the pixel level between two views of input images. Compared to the
baseline method MoCo-v2, our method introduces negligible computation overhead
(only <1% slower), but demonstrates consistently superior performance when
transferring to downstream dense prediction tasks including object detection,
semantic segmentation and instance segmentation; and outperforms the
state-of-the-art methods by a large margin. Specifically, over the strong
MoCo-v2 baseline, our method achieves significant improvements of 2.0% AP on
PASCAL VOC object detection, 1.1% AP on COCO object detection, 0.9% AP on COCO
instance segmentation, 3.0% mIoU on PASCAL VOC semantic segmentation and 1.8%
mIoU on Cityscapes semantic segmentation. Code is available at:
https://git.io/AdelaiDet
Related papers
- MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - IDEAL: Improved DEnse locAL Contrastive Learning for Semi-Supervised
Medical Image Segmentation [3.6748639131154315]
We extend the concept of metric learning to the segmentation task.
We propose a simple convolutional projection head for obtaining dense pixel-level features.
A bidirectional regularization mechanism involving two-stream regularization training is devised for the downstream task.
arXiv Detail & Related papers (2022-10-26T23:11:02Z) - Semantic Segmentation with Active Semi-Supervised Representation
Learning [23.79742108127707]
We train an effective semantic segmentation algorithm with significantly lesser labeled data.
We extend the prior state-of-the-art S4AL algorithm by replacing its mean teacher approach for semi-supervised learning with a self-training approach.
We evaluate our method on CamVid and CityScapes datasets, the de-facto standards for active learning for semantic segmentation.
arXiv Detail & Related papers (2022-10-16T00:21:43Z) - Self-supervised Learning with Local Contrastive Loss for Detection and
Semantic Segmentation [9.711659088922838]
We present a self-supervised learning (SSL) method suitable for semi-global tasks such as object detection and semantic segmentation.
We enforce local consistency between self-learned features, representing corresponding image locations of transformed versions of the same image. LC-loss can be added to existing self-supervised learning methods with minimal overhead.
Our method outperforms the existing state-of-the-art SSL approaches by 1.9% on COCO object detection, 1.4% on PASCAL VOC detection, and 0.6% on CityScapes segmentation.
arXiv Detail & Related papers (2022-07-10T06:53:15Z) - Mix-up Self-Supervised Learning for Contrast-agnostic Applications [33.807005669824136]
We present the first mix-up self-supervised learning framework for contrast-agnostic applications.
We address the low variance across images based on cross-domain mix-up and build the pretext task based on image reconstruction and transparency prediction.
arXiv Detail & Related papers (2022-04-02T16:58:36Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Attention-Guided Supervised Contrastive Learning for Semantic
Segmentation [16.729068267453897]
In a per-pixel prediction task, more than one label can exist in a single image for segmentation.
We propose an attention-guided supervised contrastive learning approach to highlight a single semantic object every time as the target.
arXiv Detail & Related papers (2021-06-03T05:01:11Z) - DetCo: Unsupervised Contrastive Learning for Object Detection [64.22416613061888]
Unsupervised contrastive learning achieves great success in learning image representations with CNN.
We present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches.
DetCo consistently outperforms supervised method by 1.6/1.2/1.0 AP on Mask RCNN-C4/FPN/RetinaNet with 1x schedule.
arXiv Detail & Related papers (2021-02-09T12:47:20Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z) - Pairwise Similarity Knowledge Transfer for Weakly Supervised Object
Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels.
In this work, we argue that learning only an objectness function is a weak form of knowledge transfer.
Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.