Exploration of visual prompt in Grounded pre-trained open-set detection
- URL: http://arxiv.org/abs/2312.08839v1
- Date: Thu, 14 Dec 2023 11:52:35 GMT
- Title: Exploration of visual prompt in Grounded pre-trained open-set detection
- Authors: Qibo Chen, Weizhong Jin, Shuchang Li, Mengdi Liu, Li Yu, Jian Jiang,
Xiaozheng Wang
- Abstract summary: We propose a novel visual prompt method that learns new category knowledge from a few labeled images.
We evaluate the method on the ODinW dataset and show that it outperforms existing prompt learning methods.
- Score: 6.560519631555968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text prompts are crucial for generalizing pre-trained open-set object
detection models to new categories. However, current methods for text prompts
are limited as they require manual feedback when generalizing to new
categories, which restricts their ability to model complex scenes, often
leading to incorrect detection results. To address this limitation, we propose
a novel visual prompt method that learns new category knowledge from a few
labeled images, which generalizes the pre-trained detection model to the new
category. To allow visual prompts to represent new categories adequately, we
propose a statistical-based prompt construction module that is not limited by
predefined vocabulary lengths, thus allowing more vectors to be used when
representing categories. We further utilize the category dictionaries in the
pre-training dataset to design task-specific similarity dictionaries, which
make visual prompts more discriminative. We evaluate the method on the ODinW
dataset and show that it outperforms existing prompt learning methods and
performs more consistently in combinatorial inference.
Related papers
- Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation [58.37525311718006]
We put forth a novel formulation of the aerial object detection problem, namely open-vocabulary aerial object detection (OVAD)
We propose CastDet, a CLIP-activated student-teacher detection framework that serves as the first OVAD detector specifically designed for the challenging aerial scenario.
Our framework integrates a robust localization teacher along with several box selection strategies to generate high-quality proposals for novel objects.
arXiv Detail & Related papers (2024-11-04T12:59:13Z) - Open-Vocabulary Temporal Action Localization using Multimodal Guidance [67.09635853019005]
OVTAL enables a model to recognize any desired action category in videos without the need to explicitly curate training data for all categories.
This flexibility poses significant challenges, as the model must recognize not only the action categories seen during training but also novel categories specified at inference.
We introduce OVFormer, a novel open-vocabulary framework extending ActionFormer with three key contributions.
arXiv Detail & Related papers (2024-06-21T18:00:05Z) - XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak
Supervision [6.406111099707549]
XAI-CLASS is a novel explanation-enhanced weakly-supervised text classification method.
It incorporates word saliency prediction as an auxiliary task.
XAI-CLASS outperforms other weakly-supervised text classification methods significantly.
arXiv Detail & Related papers (2023-10-31T23:24:22Z) - Text2Model: Text-based Model Induction for Zero-shot Image Classification [38.704831945753284]
We address the challenge of building task-agnostic classifiers using only text descriptions.
We generate zero-shot classifiers using a hypernetwork that receives class descriptions and outputs a multi-class model.
We evaluate this approach in a series of zero-shot classification tasks, for image, point-cloud, and action recognition, using a range of text descriptions.
arXiv Detail & Related papers (2022-10-27T05:19:55Z) - Novel Class Discovery without Forgetting [72.52222295216062]
We identify and formulate a new, pragmatic problem setting of NCDwF: Novel Class Discovery without Forgetting.
We propose a machine learning model to incrementally discover novel categories of instances from unlabeled data.
We introduce experimental protocols based on CIFAR-10, CIFAR-100 and ImageNet-1000 to measure the trade-off between knowledge retention and novel class discovery.
arXiv Detail & Related papers (2022-07-21T17:54:36Z) - PromptDet: Expand Your Detector Vocabulary with Uncurated Images [47.600059694034]
The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.
We propose a two-stage open-vocabulary object detector that categorises each box proposal by a classifier generated from the text encoder of a pre-trained visual-language model.
To scale up the learning procedure towards detecting a wider spectrum of objects, we exploit the available online resource, iteratively updating the prompts, and later self-training the proposed detector with pseudo labels generated on a large corpus of noisy, uncurated web images.
arXiv Detail & Related papers (2022-03-30T17:50:21Z) - Ultra-fine Entity Typing with Indirect Supervision from Natural Language
Inference [28.78215056129358]
This work presents LITE, a new approach that formulates entity typing as a natural language inference (NLI) problem.
Experiments show that, with limited training data, LITE obtains state-of-the-art performance on the UFET task.
arXiv Detail & Related papers (2022-02-12T23:56:26Z) - Semi-supervised New Event Type Induction and Description via Contrastive
Loss-Enforced Batch Attention [56.46649994444616]
We present a novel approach to semi-supervised new event type induction using a masked contrastive loss.
We extend our approach to two new tasks: predicting the type name of the discovered clusters and linking them to FrameNet frames.
arXiv Detail & Related papers (2022-02-12T00:32:22Z) - Closing the Generalization Gap in One-Shot Object Detection [92.82028853413516]
We show that the key to strong few-shot detection models may not lie in sophisticated metric learning approaches, but instead in scaling the number of categories.
Future data annotation efforts should therefore focus on wider datasets and annotate a larger number of categories.
arXiv Detail & Related papers (2020-11-09T09:31:17Z) - Few-Shot Object Detection via Knowledge Transfer [21.3564383157159]
Conventional methods for object detection usually require substantial amounts of training data and annotated bounding boxes.
In this paper, we introduce a few-shot object detection via knowledge transfer, which aims to detect objects from a few training examples.
arXiv Detail & Related papers (2020-08-28T06:35:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.