Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object
Detection
- URL: http://arxiv.org/abs/2312.02103v1
- Date: Mon, 4 Dec 2023 18:29:03 GMT
- Title: Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object
Detection
- Authors: Sunghun Kang, Junbum Cha, Jonghwan Mun, Byungseok Roh, Chang D. Yoo
- Abstract summary: We propose a simple yet effective method to learn region-text alignment for arbitrary concepts.
Specifically, the proposed method aims to learn arbitrary image-to-text mapping for pseudo-labeling of arbitrary concepts, named Pseudo-Labeling for Arbitrary Concepts (PLAC)
The proposed method shows competitive performance on the standard OVOD benchmark for noun concepts and a large improvement on referring expression comprehension benchmark for arbitrary concepts.
- Score: 25.719940401040205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-vocabulary object detection (OVOD) has recently gained significant
attention as a crucial step toward achieving human-like visual intelligence.
Existing OVOD methods extend target vocabulary from pre-defined categories to
open-world by transferring knowledge of arbitrary concepts from vision-language
pre-training models to the detectors. While previous methods have shown
remarkable successes, they suffer from indirect supervision or limited
transferable concepts. In this paper, we propose a simple yet effective method
to directly learn region-text alignment for arbitrary concepts. Specifically,
the proposed method aims to learn arbitrary image-to-text mapping for
pseudo-labeling of arbitrary concepts, named Pseudo-Labeling for Arbitrary
Concepts (PLAC). The proposed method shows competitive performance on the
standard OVOD benchmark for noun concepts and a large improvement on referring
expression comprehension benchmark for arbitrary concepts.
Related papers
- Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks.
We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm.
Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z) - Rewrite Caption Semantics: Bridging Semantic Gaps for
Language-Supervised Semantic Segmentation [100.81837601210597]
We propose Concept Curation (CoCu) to bridge the gap between visual and textual semantics in pre-training data.
CoCu achieves superb zero-shot transfer performance and greatly boosts language-supervised segmentation baseline by a large margin.
arXiv Detail & Related papers (2023-09-24T00:05:39Z) - Global Knowledge Calibration for Fast Open-Vocabulary Segmentation [124.74256749281625]
We introduce a text diversification strategy that generates a set of synonyms for each training category.
We also employ a text-guided knowledge distillation method to preserve the generalizable knowledge of CLIP.
Our proposed model achieves robust generalization performance across various datasets.
arXiv Detail & Related papers (2023-03-16T09:51:41Z) - DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for
Open-world Detection [118.36746273425354]
This paper presents a paralleled visual-concept pre-training method for open-world detection by resorting to knowledge enrichment from a designed concept dictionary.
By enriching the concepts with their descriptions, we explicitly build the relationships among various concepts to facilitate the open-domain learning.
The proposed framework demonstrates strong zero-shot detection performances, e.g., on the LVIS dataset, our DetCLIP-T outperforms GLIP-T by 9.9% mAP and obtains a 13.5% improvement on rare categories.
arXiv Detail & Related papers (2022-09-20T02:01:01Z) - Toward a Visual Concept Vocabulary for GAN Latent Space [74.12447538049537]
This paper introduces a new method for building open-ended vocabularies of primitive visual concepts represented in a GAN's latent space.
Our approach is built from three components: automatic identification of perceptually salient directions based on their layer selectivity; human annotation of these directions with free-form, compositional natural language descriptions.
Experiments show that concepts learned with our approach are reliable and composable -- generalizing across classes, contexts, and observers.
arXiv Detail & Related papers (2021-10-08T17:58:19Z) - Corpus-level and Concept-based Explanations for Interpretable Document
Classification [23.194220621342254]
We propose a corpus-level explanation approach to capture causal relationships between keywords and model predictions.
We also propose a concept-based explanation method that can automatically learn higher-level concepts and their importance to model prediction tasks.
arXiv Detail & Related papers (2020-04-24T20:54:17Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.