ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language
KnowledgeDistillation
- URL: http://arxiv.org/abs/2109.12066v1
- Date: Fri, 24 Sep 2021 16:46:36 GMT
- Title: ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language
KnowledgeDistillation
- Authors: Johnathan Xie and Shuai Zheng
- Abstract summary: A dataset such as COCO is extensively annotated across many images but with a sparse number of categories and annotating all object classes across a diverse domain is expensive and challenging.
We develop a Vision-Language distillation method that aligns both image and text embeddings from a zero-shot pre-trained model such as CLIP to a modified semantic prediction head from a one-stage detector like YOLOv5.
During inference, our model can be adapted to detect any number of object classes without additional training.
- Score: 5.424015823818208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world object sampling produces long-tailed distributions requiring
exponentially more images for rare types. Zero-shot detection, which aims to
detect unseen objects, is one direction to address this problem. A dataset such
as COCO is extensively annotated across many images but with a sparse number of
categories and annotating all object classes across a diverse domain is
expensive and challenging. To advance zero-shot detection, we develop a
Vision-Language distillation method that aligns both image and text embeddings
from a zero-shot pre-trained model such as CLIP to a modified semantic
prediction head from a one-stage detector like YOLOv5. With this method, we are
able to train an object detector that achieves state-of-the-art accuracy on the
COCO zero-shot detection splits with fewer model parameters. During inference,
our model can be adapted to detect any number of object classes without
additional training. We also find that the improvements provided by the scaling
of our method are consistent across various YOLOv5 scales. Furthermore, we
develop a self-training method that provides a significant score improvement
without needing extra images nor labels.
Related papers
- Few-shot target-driven instance detection based on open-vocabulary object detection models [1.0749601922718608]
Open-vocabulary object detection models bring closer visual and textual concepts in the same latent space.
We propose a lightweight method to turn the latter into a one-shot or few-shot object recognition models without requiring textual descriptions.
arXiv Detail & Related papers (2024-10-21T14:03:15Z) - A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation [10.461109095311546]
Low-shot object counters estimate the number of objects in an image using few or no annotated exemplars.
The existing approaches often lead to overgeneralization and false positive detections.
We introduce GeCo, a novel low-shot counter that achieves accurate object detection, segmentation, and count estimation.
arXiv Detail & Related papers (2024-09-27T12:20:29Z) - Few-Shot Object Detection with Sparse Context Transformers [37.106378859592965]
Few-shot detection is a major task in pattern recognition which seeks to localize objects using models trained with few labeled data.
We propose a novel sparse context transformer (SCT) that effectively leverages object knowledge in the source domain, and automatically learns a sparse context from only few training images in the target domain.
We evaluate the proposed method on two challenging few-shot object detection benchmarks, and empirical results show that the proposed method obtains competitive performance compared to the related state-of-the-art.
arXiv Detail & Related papers (2024-02-14T17:10:01Z) - Disambiguation of One-Shot Visual Classification Tasks: A Simplex-Based
Approach [8.436437583394998]
We present a strategy which aims at detecting the presence of multiple objects in a given shot.
This strategy is based on identifying the corners of a simplex in a high dimensional space.
We show the ability of the proposed method to slightly, yet statistically significantly, improve accuracy in extreme settings.
arXiv Detail & Related papers (2023-01-16T11:37:05Z) - Incremental-DETR: Incremental Few-Shot Object Detection via
Self-Supervised Learning [60.64535309016623]
We propose the Incremental-DETR that does incremental few-shot object detection via fine-tuning and self-supervised learning on the DETR object detector.
To alleviate severe over-fitting with few novel class data, we first fine-tune the class-specific components of DETR with self-supervision.
We further introduce a incremental few-shot fine-tuning strategy with knowledge distillation on the class-specific components of DETR to encourage the network in detecting novel classes without catastrophic forgetting.
arXiv Detail & Related papers (2022-05-09T05:08:08Z) - Robust Region Feature Synthesizer for Zero-Shot Object Detection [87.79902339984142]
We build a novel zero-shot object detection framework that contains an Intra-class Semantic Diverging component and an Inter-class Structure Preserving component.
It is the first study to carry out zero-shot object detection in remote sensing imagery.
arXiv Detail & Related papers (2022-01-01T03:09:15Z) - Few-shot Weakly-Supervised Object Detection via Directional Statistics [55.97230224399744]
We propose a probabilistic multiple instance learning approach for few-shot Common Object Localization (COL) and few-shot Weakly Supervised Object Detection (WSOD)
Our model simultaneously learns the distribution of the novel objects and localizes them via expectation-maximization steps.
Our experiments show that the proposed method, despite being simple, outperforms strong baselines in few-shot COL and WSOD, as well as large-scale WSOD tasks.
arXiv Detail & Related papers (2021-03-25T22:34:16Z) - Data Augmentation for Object Detection via Differentiable Neural
Rendering [71.00447761415388]
It is challenging to train a robust object detector when annotated data is scarce.
Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data.
We introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
arXiv Detail & Related papers (2021-03-04T06:31:06Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z) - Any-Shot Object Detection [81.88153407655334]
'Any-shot detection' is where totally unseen and few-shot categories can simultaneously co-occur during inference.
We propose a unified any-shot detection model, that can concurrently learn to detect both zero-shot and few-shot object classes.
Our framework can also be used solely for Zero-shot detection and Few-shot detection tasks.
arXiv Detail & Related papers (2020-03-16T03:43:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.