Efficient Feature Distillation for Zero-shot Annotation Object Detection
- URL: http://arxiv.org/abs/2303.12145v4
- Date: Thu, 2 Nov 2023 03:08:51 GMT
- Title: Efficient Feature Distillation for Zero-shot Annotation Object Detection
- Authors: Zhuoming Liu, Xuefeng Hu, Ram Nevatia
- Abstract summary: We propose a new setting for detecting unseen objects called Zero-shot object Detection (ZAD)
It expands the zero-shot object detection setting by allowing the novel objects to exist in the training images.
It also restricts the additional information the detector uses to novel category names.
- Score: 12.116491963892821
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new setting for detecting unseen objects called Zero-shot
Annotation object Detection (ZAD). It expands the zero-shot object detection
setting by allowing the novel objects to exist in the training images and
restricts the additional information the detector uses to novel category names.
Recently, to detect unseen objects, large-scale vision-language models (e.g.,
CLIP) are leveraged by different methods. The distillation-based methods have
good overall performance but suffer from a long training schedule caused by two
factors. First, existing work creates distillation regions biased to the base
categories, which limits the distillation of novel category information.
Second, directly using the raw feature from CLIP for distillation neglects the
domain gap between the training data of CLIP and the detection datasets, which
makes it difficult to learn the mapping from the image region to the
vision-language feature space. To solve these problems, we propose Efficient
feature distillation for Zero-shot Annotation object Detection (EZAD). Firstly,
EZAD adapts the CLIP's feature space to the target detection domain by
re-normalizing CLIP; Secondly, EZAD uses CLIP to generate distillation
proposals with potential novel category names to avoid the distillation being
overly biased toward the base categories. Finally, EZAD takes advantage of
semantic meaning for regression to further improve the model performance. As a
result, EZAD outperforms the previous distillation-based methods in COCO by 4%
with a much shorter training schedule and achieves a 3% improvement on the LVIS
dataset. Our code is available at https://github.com/dragonlzm/EZAD
Related papers
- Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation [58.37525311718006]
We put forth a novel formulation of the aerial object detection problem, namely open-vocabulary aerial object detection (OVAD)
We propose CastDet, a CLIP-activated student-teacher detection framework that serves as the first OVAD detector specifically designed for the challenging aerial scenario.
Our framework integrates a robust localization teacher along with several box selection strategies to generate high-quality proposals for novel objects.
arXiv Detail & Related papers (2024-11-04T12:59:13Z) - Debiased Novel Category Discovering and Localization [40.02326438622898]
We focus on the challenging problem of Novel Class Discovery and Localization (NCDL)
We propose an Debiased Region Mining (DRM) approach that combines class-agnostic Region Proposal Network (RPN) and class-aware RPN.
We conduct extensive experiments on the NCDL benchmark, and the results demonstrate that the proposed DRM approach significantly outperforms previous methods.
arXiv Detail & Related papers (2024-02-29T03:09:16Z) - Object-centric Cross-modal Feature Distillation for Event-based Object
Detection [87.50272918262361]
RGB detectors still outperform event-based detectors due to sparsity of the event data and missing visual details.
We develop a novel knowledge distillation approach to shrink the performance gap between these two modalities.
We show that object-centric distillation allows to significantly improve the performance of the event-based student object detector.
arXiv Detail & Related papers (2023-11-09T16:33:08Z) - Efficient Object Detection in Optical Remote Sensing Imagery via
Attention-based Feature Distillation [29.821082433621868]
We propose Attention-based Feature Distillation (AFD) for object detection.
We introduce a multi-instance attention mechanism that effectively distinguishes between background and foreground elements.
AFD attains the performance of other state-of-the-art models while being efficient.
arXiv Detail & Related papers (2023-10-28T11:15:37Z) - Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - Incremental-DETR: Incremental Few-Shot Object Detection via
Self-Supervised Learning [60.64535309016623]
We propose the Incremental-DETR that does incremental few-shot object detection via fine-tuning and self-supervised learning on the DETR object detector.
To alleviate severe over-fitting with few novel class data, we first fine-tune the class-specific components of DETR with self-supervision.
We further introduce a incremental few-shot fine-tuning strategy with knowledge distillation on the class-specific components of DETR to encourage the network in detecting novel classes without catastrophic forgetting.
arXiv Detail & Related papers (2022-05-09T05:08:08Z) - Localization Distillation for Object Detection [134.12664548771534]
Previous knowledge distillation (KD) methods for object detection mostly focus on feature imitation instead of mimicking the classification logits.
We present a novel localization distillation (LD) method which can efficiently transfer the localization knowledge from the teacher to the student.
We show that logit mimicking can outperform feature imitation and the absence of localization distillation is a critical reason for why logit mimicking underperforms for years.
arXiv Detail & Related papers (2022-04-12T17:14:34Z) - Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [69.75559390700887]
This paper explores a relatively less-studied methodology based on classification.
We propose new techniques to push its frontier in two aspects.
Experiments and visual analysis on large-scale public datasets for aerial images show the effectiveness of our approach.
arXiv Detail & Related papers (2020-11-19T05:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.