Learning to Detect and Segment for Open Vocabulary Object Detection
- URL: http://arxiv.org/abs/2212.12130v6
- Date: Thu, 29 Aug 2024 13:08:14 GMT
- Title: Learning to Detect and Segment for Open Vocabulary Object Detection
- Authors: Tao Wang, Nan Li,
- Abstract summary: We propose a principled dynamic network design to better generalize the box regression and mask segmentation for open vocabulary setting.
CondHead is composed of two streams of network heads, the dynamically aggregated head and the dynamically generated head.
Our method brings significant improvement to the state-of-the-art open vocabulary object detection methods with very minor overhead.
- Score: 6.678101044494558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open vocabulary object detection has been greatly advanced by the recent development of vision-language pretrained model, which helps recognize novel objects with only semantic categories. The prior works mainly focus on knowledge transferring to the object proposal classification and employ class-agnostic box and mask prediction. In this work, we propose CondHead, a principled dynamic network design to better generalize the box regression and mask segmentation for open vocabulary setting. The core idea is to conditionally parameterize the network heads on semantic embedding and thus the model is guided with class-specific knowledge to better detect novel categories. Specifically, CondHead is composed of two streams of network heads, the dynamically aggregated head and the dynamically generated head. The former is instantiated with a set of static heads that are conditionally aggregated, these heads are optimized as experts and are expected to learn sophisticated prediction. The latter is instantiated with dynamically generated parameters and encodes general class-specific information. With such a conditional design, the detection model is bridged by the semantic embedding to offer strongly generalizable class-wise box and mask prediction. Our method brings significant improvement to the state-of-the-art open vocabulary object detection methods with very minor overhead, e.g., it surpasses a RegionClip model by 3.0 detection AP on novel categories, with only 1.1% more computation.
Related papers
- Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation [58.37525311718006]
We put forth a novel formulation of the aerial object detection problem, namely open-vocabulary aerial object detection (OVAD)
We propose CastDet, a CLIP-activated student-teacher detection framework that serves as the first OVAD detector specifically designed for the challenging aerial scenario.
Our framework integrates a robust localization teacher along with several box selection strategies to generate high-quality proposals for novel objects.
arXiv Detail & Related papers (2024-11-04T12:59:13Z) - SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection [31.464227593768324]
We introduce Semantic Hierarchy Nexus (SHiNe), a novel classifier that uses semantic knowledge from class hierarchies.
SHiNe enhances robustness across diverse vocabulary granularities, achieving up to +31.9% mAP50 with ground truth hierarchies.
SHiNe is training-free and can be seamlessly integrated with any off-the-shelf OvOD detector.
arXiv Detail & Related papers (2024-05-16T12:42:06Z) - Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - Activate and Reject: Towards Safe Domain Generalization under Category
Shift [71.95548187205736]
We study a practical problem of Domain Generalization under Category Shift (DGCS)
It aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains.
Compared to prior DG works, we face two new challenges: 1) how to learn the concept of unknown'' during training with only source known-class samples, and 2) how to adapt the source-trained model to unseen environments.
arXiv Detail & Related papers (2023-10-07T07:53:12Z) - Meta-ZSDETR: Zero-shot DETR with Meta-learning [29.58827207505671]
We present the first method that combines DETR and meta-learning to perform zero-shot object detection, named Meta-ZSDETR.
The model is optimized with meta-contrastive learning, which contains a regression head to generate the coordinates of class-specific boxes.
Experimental results show that our method outperforms the existing ZSD methods by a large margin.
arXiv Detail & Related papers (2023-08-18T13:17:07Z) - Global Knowledge Calibration for Fast Open-Vocabulary Segmentation [124.74256749281625]
We introduce a text diversification strategy that generates a set of synonyms for each training category.
We also employ a text-guided knowledge distillation method to preserve the generalizable knowledge of CLIP.
Our proposed model achieves robust generalization performance across various datasets.
arXiv Detail & Related papers (2023-03-16T09:51:41Z) - A Unified Object Counting Network with Object Occupation Prior [32.32999623924954]
Existing object counting tasks are designed for a single object class.
It is inevitable to encounter newly coming data with new classes in our real world.
We build the first evolving object counting dataset and propose a unified object counting network.
arXiv Detail & Related papers (2022-12-29T06:42:51Z) - DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for
Open-world Detection [118.36746273425354]
This paper presents a paralleled visual-concept pre-training method for open-world detection by resorting to knowledge enrichment from a designed concept dictionary.
By enriching the concepts with their descriptions, we explicitly build the relationships among various concepts to facilitate the open-domain learning.
The proposed framework demonstrates strong zero-shot detection performances, e.g., on the LVIS dataset, our DetCLIP-T outperforms GLIP-T by 9.9% mAP and obtains a 13.5% improvement on rare categories.
arXiv Detail & Related papers (2022-09-20T02:01:01Z) - Exploiting Unlabeled Data with Vision and Language Models for Object
Detection [64.94365501586118]
Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets.
We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images.
We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection and semi-supervised object detection.
arXiv Detail & Related papers (2022-07-18T21:47:15Z) - Localized Vision-Language Matching for Open-vocabulary Object Detection [41.98293277826196]
We propose an open-world object detection method that learns to detect novel object classes along with a given set of known classes.
It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels.
We show that a simple language model fits better than a large contextualized language model for detecting novel objects.
arXiv Detail & Related papers (2022-05-12T15:34:37Z) - Mixed Supervised Object Detection by Transferring Mask Prior and
Semantic Similarity [22.706462533761986]
We consider object detection with mixed supervision, which learns novel object categories using weak annotations.
We further transfer mask prior and semantic similarity to bridge the gap between novel categories and base categories.
Experimental results on three benchmark datasets demonstrate the effectiveness of our method over existing methods.
arXiv Detail & Related papers (2021-10-27T05:43:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.