DetCo: Unsupervised Contrastive Learning for Object Detection
- URL: http://arxiv.org/abs/2102.04803v1
- Date: Tue, 9 Feb 2021 12:47:20 GMT
- Title: DetCo: Unsupervised Contrastive Learning for Object Detection
- Authors: Enze Xie, Jian Ding, Wenhai Wang, Xiaohang Zhan, Hang Xu, Zhenguo Li,
Ping Luo
- Abstract summary: Unsupervised contrastive learning achieves great success in learning image representations with CNN.
We present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches.
DetCo consistently outperforms supervised method by 1.6/1.2/1.0 AP on Mask RCNN-C4/FPN/RetinaNet with 1x schedule.
- Score: 64.22416613061888
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised contrastive learning achieves great success in learning image
representations with CNN. Unlike most recent methods that focused on improving
accuracy of image classification, we present a novel contrastive learning
approach, named DetCo, which fully explores the contrasts between global image
and local image patches to learn discriminative representations for object
detection. DetCo has several appealing benefits. (1) It is carefully designed
by investigating the weaknesses of current self-supervised methods, which
discard important representations for object detection. (2) DetCo builds
hierarchical intermediate contrastive losses between global image and local
patches to improve object detection, while maintaining global representations
for image recognition. Theoretical analysis shows that the local patches
actually remove the contextual information of an image, improving the lower
bound of mutual information for better contrastive learning. (3) Extensive
experiments on PASCAL VOC, COCO and Cityscapes demonstrate that DetCo not only
outperforms state-of-the-art methods on object detection, but also on
segmentation, pose estimation, and 3D shape prediction, while it is still
competitive on image classification. For example, on PASCAL VOC, DetCo-100ep
achieves 57.4 mAP, which is on par with the result of MoCov2-800ep. Moreover,
DetCo consistently outperforms supervised method by 1.6/1.2/1.0 AP on Mask
RCNN-C4/FPN/RetinaNet with 1x schedule. Code will be released at
\href{https://github.com/xieenze/DetCo}{\color{blue}{\tt
github.com/xieenze/DetCo}} and
\href{https://github.com/open-mmlab/OpenSelfSup}{\color{blue}{\tt
github.com/open-mmlab/OpenSelfSup}}.
Related papers
- Rethinking Image Forgery Detection via Contrastive Learning and
Unsupervised Clustering [26.923409536155166]
We propose FOrensic ContrAstive cLustering (FOCAL) method for image forgery detection.
FOCAL is based on contrastive learning and unsupervised clustering.
Results show FOCAL significantly outperforms state-of-the-art competing algorithms.
arXiv Detail & Related papers (2023-08-18T05:05:30Z) - Mix-up Self-Supervised Learning for Contrast-agnostic Applications [33.807005669824136]
We present the first mix-up self-supervised learning framework for contrast-agnostic applications.
We address the low variance across images based on cross-domain mix-up and build the pretext task based on image reconstruction and transparency prediction.
arXiv Detail & Related papers (2022-04-02T16:58:36Z) - Object-Aware Cropping for Self-Supervised Learning [21.79324121283122]
We show that self-supervised learning based on the usual random cropping performs poorly on such datasets.
We propose replacing one or both of the random crops with crops obtained from an object proposal algorithm.
Using this approach, which we call object-aware cropping, results in significant improvements over scene cropping on classification and object detection benchmarks.
arXiv Detail & Related papers (2021-12-01T07:23:37Z) - Unsupervised Pretraining for Object Detection by Patch Reidentification [72.75287435882798]
Unsupervised representation learning achieves promising performances in pre-training representations for object detectors.
This work proposes a simple yet effective representation learning method for object detection, named patch re-identification (Re-ID)
Our method significantly outperforms its counterparts on COCO in all settings, such as different training iterations and data percentages.
arXiv Detail & Related papers (2021-03-08T15:13:59Z) - Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z) - Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation
Learning [108.999497144296]
Recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations.
This work aims to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs.
Despite its conceptual simplicity, we show empirically that with the solution -- Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space.
arXiv Detail & Related papers (2020-03-11T17:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.