Unsupervised Pretraining for Object Detection by Patch Reidentification
- URL: http://arxiv.org/abs/2103.04814v1
- Date: Mon, 8 Mar 2021 15:13:59 GMT
- Title: Unsupervised Pretraining for Object Detection by Patch Reidentification
- Authors: Jian Ding, Enze Xie, Hang Xu, Chenhan Jiang, Zhenguo Li, Ping Luo,
Gui-Song Xia
- Abstract summary: Unsupervised representation learning achieves promising performances in pre-training representations for object detectors.
This work proposes a simple yet effective representation learning method for object detection, named patch re-identification (Re-ID)
Our method significantly outperforms its counterparts on COCO in all settings, such as different training iterations and data percentages.
- Score: 72.75287435882798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised representation learning achieves promising performances in
pre-training representations for object detectors. However, previous approaches
are mainly designed for image-level classification, leading to suboptimal
detection performance. To bridge the performance gap, this work proposes a
simple yet effective representation learning method for object detection, named
patch re-identification (Re-ID), which can be treated as a contrastive pretext
task to learn location-discriminative representation unsupervisedly, possessing
appealing advantages compared to its counterparts. Firstly, unlike
fully-supervised person Re-ID that matches a human identity in different camera
views, patch Re-ID treats an important patch as a pseudo identity and
contrastively learns its correspondence in two different image views, where the
pseudo identity has different translations and transformations, enabling to
learn discriminative features for object detection. Secondly, patch Re-ID is
performed in Deeply Unsupervised manner to learn multi-level representations,
appealing to object detection. Thirdly, extensive experiments show that our
method significantly outperforms its counterparts on COCO in all settings, such
as different training iterations and data percentages. For example, Mask R-CNN
initialized with our representation surpasses MoCo v2 and even its
fully-supervised counterparts in all setups of training iterations (e.g. 2.1
and 1.1 mAP improvement compared to MoCo v2 in 12k and 90k iterations
respectively). Code will be released at https://github.com/dingjiansw101/DUPR.
Related papers
- From Global to Local: Multi-scale Out-of-distribution Detection [129.37607313927458]
Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process.
Recent progress in representation learning gives rise to distance-based OOD detection.
We propose Multi-scale OOD DEtection (MODE), a first framework leveraging both global visual information and local region details.
arXiv Detail & Related papers (2023-08-20T11:56:25Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Deepfake Detection via Joint Unsupervised Reconstruction and Supervised
Classification [25.84902508816679]
We introduce a novel approach for deepfake detection, which considers the reconstruction and classification tasks simultaneously.
This method shares the information learned by one task with the other, which focuses on a different aspect other existing works rarely consider.
Our method achieves state-of-the-art performance on three commonly-used datasets.
arXiv Detail & Related papers (2022-11-24T05:44:26Z) - Pseudo-Pair based Self-Similarity Learning for Unsupervised Person
Re-identification [47.44945334929426]
We present a pseudo-pair based self-similarity learning approach for unsupervised person re-ID without human annotations.
We propose to assign pseudo labels to images through the pairwise-guided similarity separation.
It learns local discriminative features from individual images via intra-similarity, and discovers the patch correspondence across images via inter-similarity.
arXiv Detail & Related papers (2022-07-09T04:05:06Z) - InsCon:Instance Consistency Feature Representation via Self-Supervised
Learning [9.416267640069297]
We propose a new end-to-end self-supervised framework called InsCon, which is devoted to capturing multi-instance information.
InsCon builds a targeted learning paradigm that applies multi-instance images as input, aligning the learned feature between corresponding instance views.
On the other hand, InsCon introduces the pull and push of cell-instance, which utilizes cell consistency to enhance fine-grained feature representation.
arXiv Detail & Related papers (2022-03-15T07:09:00Z) - DetCo: Unsupervised Contrastive Learning for Object Detection [64.22416613061888]
Unsupervised contrastive learning achieves great success in learning image representations with CNN.
We present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches.
DetCo consistently outperforms supervised method by 1.6/1.2/1.0 AP on Mask RCNN-C4/FPN/RetinaNet with 1x schedule.
arXiv Detail & Related papers (2021-02-09T12:47:20Z) - Camera-aware Proxies for Unsupervised Person Re-Identification [60.26031011794513]
This paper tackles the purely unsupervised person re-identification (Re-ID) problem that requires no annotations.
We propose to split each single cluster into multiple proxies and each proxy represents the instances coming from the same camera.
Based on the camera-aware proxies, we design both intra- and inter-camera contrastive learning components for our Re-ID model.
arXiv Detail & Related papers (2020-12-19T12:37:04Z) - Exploit Clues from Views: Self-Supervised and Regularized Learning for
Multiview Object Recognition [66.87417785210772]
This work investigates the problem of multiview self-supervised learning (MV-SSL)
A novel surrogate task for self-supervised learning is proposed by pursuing "object invariant" representation.
Experiments shows that the recognition and retrieval results using view invariant prototype embedding (VISPE) outperform other self-supervised learning methods.
arXiv Detail & Related papers (2020-03-28T07:06:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.