Aligned Unsupervised Pretraining of Object Detectors with Self-training
- URL: http://arxiv.org/abs/2307.15697v2
- Date: Sun, 7 Jul 2024 10:46:52 GMT
- Title: Aligned Unsupervised Pretraining of Object Detectors with Self-training
- Authors: Ioannis Maniadis Metaxas, Adrian Bulat, Ioannis Patras, Brais Martinez, Georgios Tzimiropoulos,
- Abstract summary: Unsupervised pretraining of object detectors has recently become a key component of object detector training.
We propose a framework that mitigates this issue and consists of three simple yet key ingredients.
We show that our strategy is also capable of pretraining from scratch (including the backbone) and works on complex images like COCO.
- Score: 41.03780087924593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The unsupervised pretraining of object detectors has recently become a key component of object detector training, as it leads to improved performance and faster convergence during the supervised fine-tuning stage. Existing unsupervised pretraining methods, however, typically rely on low-level information to define proposals that are used to train the detector. Furthermore, in the absence of class labels for these proposals, an auxiliary loss is used to add high-level semantics. This results in complex pipelines and a task gap between the pretraining and the downstream task. We propose a framework that mitigates this issue and consists of three simple yet key ingredients: (i) richer initial proposals that do encode high-level semantics, (ii) class pseudo-labeling through clustering, that enables pretraining using a standard object detection training pipeline, (iii) self-training to iteratively improve and enrich the object proposals. Once the pretraining and downstream tasks are aligned, a simple detection pipeline without further bells and whistles can be directly used for pretraining and, in fact, results in state-of-the-art performance on both the full and low data regimes, across detector architectures and datasets, by significant margins. We further show that our pretraining strategy is also capable of pretraining from scratch (including the backbone) and works on complex images like COCO, paving the path for unsupervised representation learning using object detection directly as a pretext task.
Related papers
- AlignDet: Aligning Pre-training and Fine-tuning in Object Detection [38.256555424079664]
AlignDet is a unified pre-training framework that can be adapted to various existing detectors to alleviate the discrepancies.
It can achieve significant improvements across diverse protocols, such as detection algorithm, model backbone, data setting, and training schedule.
arXiv Detail & Related papers (2023-07-20T17:55:14Z) - Focusing on what to decode and what to train: Efficient Training with
HOI Split Decoders and Specific Target Guided DeNoising [17.268302302974607]
Recent one-stage transformer-based methods achieve notable gains in the Human-object Interaction Detection (HOI) task by leveraging the detection of DETR.
We propose a novel one-stage framework (SOV) which consists of a subject decoder, an object decoder, and a verb decoder.
We propose a novel Specific Target Guided (STG) DeNoising training strategy, which leverages learnable object and verb label embeddings to guide the training and accelerate the training convergence.
arXiv Detail & Related papers (2023-07-05T13:42:31Z) - Label-Efficient Object Detection via Region Proposal Network
Pre-Training [58.50615557874024]
We propose a simple pretext task that provides an effective pre-training for the region proposal network (RPN)
In comparison with multi-stage detectors without RPN pre-training, our approach is able to consistently improve downstream task performance.
arXiv Detail & Related papers (2022-11-16T16:28:18Z) - Self-supervised Pretraining with Classification Labels for Temporal
Activity Detection [54.366236719520565]
Temporal Activity Detection aims to predict activity classes per frame.
Due to the expensive frame-level annotations required for detection, the scale of detection datasets is limited.
This work proposes a novel self-supervised pretraining method for detection leveraging classification labels.
arXiv Detail & Related papers (2021-11-26T18:59:28Z) - DETReg: Unsupervised Pretraining with Region Priors for Object Detection [103.93533951746612]
DETReg is a new self-supervised method that pretrains the entire object detection network.
During pretraining, DETReg predicts object localizations to match the localizations from an unsupervised region proposal generator.
It simultaneously aligns the corresponding feature embeddings with embeddings from a self-supervised image encoder.
arXiv Detail & Related papers (2021-06-08T17:39:14Z) - Aligning Pretraining for Detection via Object-Level Contrastive Learning [57.845286545603415]
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task.
Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection.
arXiv Detail & Related papers (2021-06-04T17:59:52Z) - DAP: Detection-Aware Pre-training with Weak Supervision [37.336674323981285]
This paper presents a detection-aware pre-training (DAP) approach for object detection tasks.
We transform a classification dataset into a detection dataset through a weakly supervised object localization method based on Class Activation Maps.
We show that DAP can outperform the traditional classification pre-training in terms of both sample efficiency and convergence speed in downstream detection tasks including VOC and COCO.
arXiv Detail & Related papers (2021-03-30T19:48:30Z) - Open-set Short Utterance Forensic Speaker Verification using
Teacher-Student Network with Explicit Inductive Bias [59.788358876316295]
We propose a pipeline solution to improve speaker verification on a small actual forensic field dataset.
By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning.
We show that the proposed objective function can efficiently improve the performance of teacher-student learning on short utterances.
arXiv Detail & Related papers (2020-09-21T00:58:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.