A Study on Self-Supervised Object Detection Pretraining
- URL: http://arxiv.org/abs/2207.04186v1
- Date: Sat, 9 Jul 2022 03:30:44 GMT
- Title: A Study on Self-Supervised Object Detection Pretraining
- Authors: Trung Dang, Simon Kornblith, Huy Thong Nguyen, Peter Chin, Maryam
Khademi
- Abstract summary: We study different approaches to self-supervised pretraining of object detection models.
We first design a general framework to learn a spatially consistent dense representation from an image.
We study existing design choices in the literature, such as box generation, feature extraction strategies, and using multiple views.
- Score: 14.38896715041483
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we study different approaches to self-supervised pretraining of
object detection models. We first design a general framework to learn a
spatially consistent dense representation from an image, by randomly sampling
and projecting boxes to each augmented view and maximizing the similarity
between corresponding box features. We study existing design choices in the
literature, such as box generation, feature extraction strategies, and using
multiple views inspired by its success on instance-level image representation
learning techniques. Our results suggest that the method is robust to different
choices of hyperparameters, and using multiple views is not as effective as
shown for instance-level image representation learning. We also design two
auxiliary tasks to predict boxes in one view from their features in the other
view, by (1) predicting boxes from the sampled set by using a contrastive loss,
and (2) predicting box coordinates using a transformer, which potentially
benefits downstream object detection tasks. We found that these tasks do not
lead to better object detection performance when finetuning the pretrained
model on labeled data.
Related papers
- Exploring Robust Features for Few-Shot Object Detection in Satellite
Imagery [17.156864650143678]
We develop a few-shot object detector based on a traditional two-stage architecture.
A large-scale pre-trained model is used to build class-reference embeddings or prototypes.
We perform evaluations on two remote sensing datasets containing challenging and rare objects.
arXiv Detail & Related papers (2024-03-08T15:20:27Z) - UniST: Towards Unifying Saliency Transformer for Video Saliency
Prediction and Detection [9.063895463649414]
We introduce the Unified Saliency Transformer (UniST) framework, which comprehensively utilizes the essential attributes of video saliency prediction and video salient object detection.
To the best of our knowledge, this is the first work that explores designing a transformer structure for both saliency modeling tasks.
arXiv Detail & Related papers (2023-09-15T07:39:53Z) - Matching Multiple Perspectives for Efficient Representation Learning [0.0]
We present an approach that combines self-supervised learning with a multi-perspective matching technique.
We show that the availability of multiple views of the same object combined with a variety of self-supervised pretraining algorithms can lead to improved object classification performance.
arXiv Detail & Related papers (2022-08-16T10:33:13Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z) - Aligning Pretraining for Detection via Object-Level Contrastive Learning [57.845286545603415]
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task.
Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection.
arXiv Detail & Related papers (2021-06-04T17:59:52Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - Ensembling object detectors for image and video data analysis [98.26061123111647]
We propose a method for ensembling the outputs of multiple object detectors for improving detection performance and precision of bounding boxes on image data.
We extend it to video data by proposing a two-stage tracking-based scheme for detection refinement.
arXiv Detail & Related papers (2021-02-09T12:38:16Z) - Adaptive Object Detection with Dual Multi-Label Prediction [78.69064917947624]
We propose a novel end-to-end unsupervised deep domain adaptation model for adaptive object detection.
The model exploits multi-label prediction to reveal the object category information in each image.
We introduce a prediction consistency regularization mechanism to assist object detection.
arXiv Detail & Related papers (2020-03-29T04:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.