PromptDet: Expand Your Detector Vocabulary with Uncurated Images
- URL: http://arxiv.org/abs/2203.16513v1
- Date: Wed, 30 Mar 2022 17:50:21 GMT
- Title: PromptDet: Expand Your Detector Vocabulary with Uncurated Images
- Authors: Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren,
Xiaolin Wei, Weidi Xie, Lin Ma
- Abstract summary: The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.
We propose a two-stage open-vocabulary object detector that categorises each box proposal by a classifier generated from the text encoder of a pre-trained visual-language model.
To scale up the learning procedure towards detecting a wider spectrum of objects, we exploit the available online resource, iteratively updating the prompts, and later self-training the proposed detector with pseudo labels generated on a large corpus of noisy, uncurated web images.
- Score: 47.600059694034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of this work is to establish a scalable pipeline for expanding an
object detector towards novel/unseen categories, using zero manual annotations.
To achieve that, we make the following four contributions: (i) in pursuit of
generalisation, we propose a two-stage open-vocabulary object detector that
categorises each box proposal by a classifier generated from the text encoder
of a pre-trained visual-language model; (ii) To pair the visual latent space
(from RPN box proposal) with that of the pre-trained text encoder, we propose
the idea of regional prompt learning to optimise a couple of learnable prompt
vectors, converting the textual embedding space to fit those visually
object-centric images; (iii) To scale up the learning procedure towards
detecting a wider spectrum of objects, we exploit the available online
resource, iteratively updating the prompts, and later self-training the
proposed detector with pseudo labels generated on a large corpus of noisy,
uncurated web images. The self-trained detector, termed as PromptDet,
significantly improves the detection performance on categories for which manual
annotations are unavailable or hard to obtain, e.g. rare categories. Finally,
(iv) to validate the necessity of our proposed components, we conduct extensive
experiments on the challenging LVIS and MS-COCO dataset, showing superior
performance over existing approaches with fewer additional training images and
zero manual annotations whatsoever. Project page with code:
https://fcjian.github.io/promptdet.
Related papers
- Exploring Robust Features for Few-Shot Object Detection in Satellite
Imagery [17.156864650143678]
We develop a few-shot object detector based on a traditional two-stage architecture.
A large-scale pre-trained model is used to build class-reference embeddings or prototypes.
We perform evaluations on two remote sensing datasets containing challenging and rare objects.
arXiv Detail & Related papers (2024-03-08T15:20:27Z) - Text as Image: Learning Transferable Adapter for Multi-Label
Classification [13.11583340598517]
We introduce an effective approach to employ large language models for multi-label instruction-following text generation.
In this way, a fully automated pipeline for visual label recognition is developed without relying on any manual data.
arXiv Detail & Related papers (2023-12-07T09:22:20Z) - Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning [13.667326007851674]
We propose CastDet, a CLIP-activated student-teacher open-vocabulary object detection framework.
Our approach boosts not only novel object proposals but also classification.
Experimental results demonstrate our CastDet achieving superior open-vocabulary detection performance.
arXiv Detail & Related papers (2023-11-20T10:26:04Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Multi-Modal Classifiers for Open-Vocabulary Object Detection [104.77331131447541]
The goal of this paper is open-vocabulary object detection (OVOD)
We adopt a standard two-stage object detector architecture.
We explore three ways via: language descriptions, image exemplars, or a combination of the two.
arXiv Detail & Related papers (2023-06-08T18:31:56Z) - Semantic Prompt for Few-Shot Image Recognition [76.68959583129335]
We propose a novel Semantic Prompt (SP) approach for few-shot learning.
The proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary
Object Detection [87.39089806069707]
We propose a fine-grained Visual-Text Prompt-driven self-training paradigm for Open-Vocabulary Detection (VTP-OVD)
During the adapting stage, we enable VLM to obtain fine-grained alignment by using learnable text prompts to resolve an auxiliary dense pixel-wise prediction task.
Experiments show that our method achieves the state-of-the-art performance for open-vocabulary object detection, e.g., 31.5% mAP on unseen classes of COCO.
arXiv Detail & Related papers (2022-11-02T03:38:02Z) - Learning to Prompt for Open-Vocabulary Object Detection with
Vision-Language Model [34.85604521903056]
We introduce a novel method, detection prompt (DetPro), to learn continuous prompt representations for open-vocabulary object detection.
We assemble DetPro with ViLD, a recent state-of-the-art open-world object detector.
Experimental results show that our DetPro outperforms the baseline ViLD in all settings.
arXiv Detail & Related papers (2022-03-28T17:50:26Z) - Open-Vocabulary DETR with Conditional Matching [86.1530128487077]
OV-DETR is an open-vocabulary detector based on DETR.
It can detect any object given its class name or an exemplar image.
It achieves non-trivial improvements over current state of the arts.
arXiv Detail & Related papers (2022-03-22T16:54:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.