Towards Open Vocabulary Object Detection without Human-provided Bounding
Boxes
- URL: http://arxiv.org/abs/2111.09452v1
- Date: Thu, 18 Nov 2021 00:05:52 GMT
- Title: Towards Open Vocabulary Object Detection without Human-provided Bounding
Boxes
- Authors: Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, Ran Xu, Wenhao
Liu, Caiming Xiong
- Abstract summary: We propose an open vocabulary detection framework that can be trained without manually provided bounding-box annotations.
Our method achieves this by leveraging the localization ability of pre-trained vision-language models.
- Score: 74.24276505126932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite great progress in object detection, most existing methods are limited
to a small set of object categories, due to the tremendous human effort needed
for instance-level bounding-box annotation. To alleviate the problem, recent
open vocabulary and zero-shot detection methods attempt to detect object
categories not seen during training. However, these approaches still rely on
manually provided bounding-box annotations on a set of base classes. We propose
an open vocabulary detection framework that can be trained without manually
provided bounding-box annotations. Our method achieves this by leveraging the
localization ability of pre-trained vision-language models and generating
pseudo bounding-box labels that can be used directly for training object
detectors. Experimental results on COCO, PASCAL VOC, Objects365 and LVIS
demonstrate the effectiveness of our method. Specifically, our method
outperforms the state-of-the-arts (SOTA) that are trained using human annotated
bounding-boxes by 3% AP on COCO novel categories even though our training
source is not equipped with manual bounding-box labels. When utilizing the
manual bounding-box labels as our baselines do, our method surpasses the SOTA
largely by 8% AP.
Related papers
- Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation [58.37525311718006]
We put forth a novel formulation of the aerial object detection problem, namely open-vocabulary aerial object detection (OVAD)
We propose CastDet, a CLIP-activated student-teacher detection framework that serves as the first OVAD detector specifically designed for the challenging aerial scenario.
Our framework integrates a robust localization teacher along with several box selection strategies to generate high-quality proposals for novel objects.
arXiv Detail & Related papers (2024-11-04T12:59:13Z) - LP-OVOD: Open-Vocabulary Object Detection by Linear Probing [8.202076059391315]
An object detector must identify both seen and unseen classes in test images without labeled examples of the unseen classes in training.
A typical approach for OVOD is to use joint text-image embeddings of CLIP to assign box proposals to their closest text label.
This method has a critical issue: many low-quality boxes, such as over- and under-covered-object boxes, have the same similarity score as high-quality boxes since CLIP is not trained on exact object location information.
We propose a novel method, LP-OVOD, that discards low-quality boxes by training a
arXiv Detail & Related papers (2023-10-26T02:37:08Z) - DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection [72.25697820290502]
This work introduces a straightforward and efficient strategy to identify potential novel classes through zero-shot classification.
We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training.
Empirical evaluations on three datasets, including LVIS, V3Det, and COCO, demonstrate significant improvements over the baseline performance.
arXiv Detail & Related papers (2023-10-02T17:52:24Z) - Three ways to improve feature alignment for open vocabulary detection [88.65076922242184]
Key problem in zero-shot open vocabulary detection is how to align visual and text features, so that the detector performs well on unseen classes.
Previous approaches train the feature pyramid and detection head from scratch, which breaks the vision-text feature alignment established during pretraining.
We propose three methods to alleviate these issues. Firstly, a simple scheme is used to augment the text embeddings which prevents overfitting to a small number of classes seen during training.
Secondly, the feature pyramid network and the detection head are modified to include trainable shortcuts.
Finally, a self-training approach is used to leverage a larger corpus of
arXiv Detail & Related papers (2023-03-23T17:59:53Z) - Label, Verify, Correct: A Simple Few Shot Object Detection Method [93.84801062680786]
We introduce a simple pseudo-labelling method to source high-quality pseudo-annotations from a training set.
We present two novel methods to improve the precision of the pseudo-labelling process.
Our method achieves state-of-the-art or second-best performance compared to existing approaches.
arXiv Detail & Related papers (2021-12-10T18:59:06Z) - Boosting Weakly Supervised Object Detection via Learning Bounding Box
Adjusters [76.36104006511684]
Weakly-supervised object detection (WSOD) has emerged as an inspiring recent topic to avoid expensive instance-level object annotations.
We defend the problem setting for improving localization performance by leveraging the bounding box regression knowledge from a well-annotated auxiliary dataset.
Our method performs favorably against state-of-the-art WSOD methods and knowledge transfer model with similar problem setting.
arXiv Detail & Related papers (2021-08-03T13:38:20Z) - Iterative Bounding Box Annotation for Object Detection [0.456877715768796]
We propose a semi-automatic method for efficient bounding box annotation.
The method trains the object detector iteratively on small batches of labeled images.
It learns to propose bounding boxes for the next batch, after which the human annotator only needs to correct possible errors.
arXiv Detail & Related papers (2020-07-02T08:40:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.