Open Vocabulary Object Detection with Proposal Mining and Prediction
Equalization
- URL: http://arxiv.org/abs/2206.11134v2
- Date: Fri, 24 Jun 2022 08:58:58 GMT
- Title: Open Vocabulary Object Detection with Proposal Mining and Prediction
Equalization
- Authors: Peixian Chen, Kekai Sheng, Mengdan Zhang, Yunhang Shen, Ke Li, Chunhua
Shen
- Abstract summary: Open-vocabulary object detection (OVD) aims to scale up vocabulary size to detect objects of novel categories beyond the training vocabulary.
Recent work resorts to the rich knowledge in pre-trained vision-language models.
We present MEDet, a novel OVD framework with proposal mining and prediction equalization.
- Score: 73.14053674836838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-vocabulary object detection (OVD) aims to scale up vocabulary size to
detect objects of novel categories beyond the training vocabulary. Recent work
resorts to the rich knowledge in pre-trained vision-language models. However,
existing methods are ineffective in proposal-level vision-language alignment.
Meanwhile, the models usually suffer from confidence bias toward base
categories and perform worse on novel ones. To overcome the challenges, we
present MEDet, a novel and effective OVD framework with proposal mining and
prediction equalization. First, we design an online proposal mining to refine
the inherited vision-semantic knowledge from coarse to fine, allowing for
proposal-level detection-oriented feature alignment. Second, based on causal
inference theory, we introduce a class-wise backdoor adjustment to reinforce
the predictions on novel categories to improve the overall OVD performance.
Extensive experiments on COCO and LVIS benchmarks verify the superiority of
MEDet over the competing approaches in detecting objects of novel categories,
e.g., 32.6% AP50 on COCO and 22.4% mask mAP on LVIS.
Related papers
- MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context.
Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets.
Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z) - LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction [63.668635390907575]
Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs)
We propose the Language Model Instruction (LaMI) strategy, which leverages the relationships between visual concepts and applies them within a simple yet effective DETR-like detector.
arXiv Detail & Related papers (2024-07-16T02:58:33Z) - Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection [101.15777242546649]
Open vocabulary object detection (OVD) aims at seeking an optimal object detector capable of recognizing objects from both base and novel categories.
Recent advances leverage knowledge distillation to transfer insightful knowledge from pre-trained large-scale vision-language models to the task of object detection.
We present a novel OVD framework termed LBP to propose learning background prompts to harness explored implicit background knowledge.
arXiv Detail & Related papers (2024-06-01T17:32:26Z) - Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation [3.0899016152680754]
Open-vocabulary object detection (OVOD) aims at localizing and recognizing visual objects from novel classes unseen at the training time.
This paper systematically investigates this problem with the commonly-adopted two-stage OVOD paradigm.
To alleviate this problem, this paper introduces two advanced measures to adjust confidence scores and conserve erroneously dismissed objects.
arXiv Detail & Related papers (2024-04-12T17:02:56Z) - Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization [63.66349334291372]
We propose a framework with Meta prompt and Instance Contrastive learning (MIC) schemes.
Firstly, we simulate a novel-class-emerging scenario to help the prompt that learns class and background prompts generalize to novel classes.
Secondly, we design an instance-level contrastive strategy to promote intra-class compactness and inter-class separation, which benefits generalization of the detector to novel class objects.
arXiv Detail & Related papers (2024-03-14T14:25:10Z) - EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment [28.983503845298824]
We propose Early Dense Alignment (EDA) to bridge the gap between generalizable local semantics and object-level prediction.
In EDA, we use object-level supervision to learn the dense-level rather than object-level alignment to maintain the local fine-grained semantics.
arXiv Detail & Related papers (2023-09-03T12:04:14Z) - How to Evaluate the Generalization of Detection? A Benchmark for
Comprehensive Open-Vocabulary Detection [25.506346503624894]
We propose a new benchmark named OVDEval, which includes 9 sub-tasks and introduces evaluations on commonsense knowledge.
The dataset is meticulously created to provide hard negatives that challenge models' true understanding of visual and linguistic input.
arXiv Detail & Related papers (2023-08-25T04:54:32Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.