Query-Based Knowledge Sharing for Open-Vocabulary Multi-Label
Classification
- URL: http://arxiv.org/abs/2401.01181v1
- Date: Tue, 2 Jan 2024 12:18:40 GMT
- Title: Query-Based Knowledge Sharing for Open-Vocabulary Multi-Label
Classification
- Authors: Xuelin Zhu, Jian Liu, Dongqi Tang, Jiawei Ge, Weijia Liu, Bo Liu,
Jiuxin Cao
- Abstract summary: Multi-label zero-shot learning is a non-trivial task in computer vision.
We propose a novel query-based knowledge sharing paradigm for this task.
Our framework significantly outperforms state-of-the-art methods on zero-shot task by 5.9% and 4.5% in mAP on the NUS-WIDE and Open Images.
- Score: 5.985859108787149
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifying labels that did not appear during training, known as multi-label
zero-shot learning, is a non-trivial task in computer vision. To this end,
recent studies have attempted to explore the multi-modal knowledge of
vision-language pre-training (VLP) models by knowledge distillation, allowing
to recognize unseen labels in an open-vocabulary manner. However, experimental
evidence shows that knowledge distillation is suboptimal and provides limited
performance gain in unseen label prediction. In this paper, a novel query-based
knowledge sharing paradigm is proposed to explore the multi-modal knowledge
from the pretrained VLP model for open-vocabulary multi-label classification.
Specifically, a set of learnable label-agnostic query tokens is trained to
extract critical vision knowledge from the input image, and further shared
across all labels, allowing them to select tokens of interest as visual clues
for recognition. Besides, we propose an effective prompt pool for robust label
embedding, and reformulate the standard ranking learning into a form of
classification to allow the magnitude of feature vectors for matching, which
both significantly benefit label recognition. Experimental results show that
our framework significantly outperforms state-of-the-art methods on zero-shot
task by 5.9% and 4.5% in mAP on the NUS-WIDE and Open Images, respectively.
Related papers
- Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning [23.671999163027284]
This paper proposes a novel framework for multi-label image recognition without any training data.
It uses knowledge of pre-trained Large Language Model to learn prompts to adapt pretrained Vision-Language Model like CLIP to multilabel classification.
Our framework presents a new way to explore the synergies between multiple pre-trained models for novel category recognition.
arXiv Detail & Related papers (2024-03-02T13:43:32Z) - Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - Multi-Label Knowledge Distillation [86.03990467785312]
We propose a novel multi-label knowledge distillation method.
On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems.
On the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings.
arXiv Detail & Related papers (2023-08-12T03:19:08Z) - Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge
Transfer [55.885555581039895]
Multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding.
We propose a novel open-vocabulary framework, named multimodal knowledge transfer (MKT) for multi-label classification.
arXiv Detail & Related papers (2022-07-05T08:32:18Z) - Open-Set Representation Learning through Combinatorial Embedding [62.05670732352456]
We are interested in identifying novel concepts in a dataset through representation learning based on the examples in both labeled and unlabeled classes.
We propose a learning approach, which naturally clusters examples in unseen classes using the compositional knowledge given by multiple supervised meta-classifiers on heterogeneous label spaces.
The proposed algorithm discovers novel concepts via a joint optimization of enhancing the discrimitiveness of unseen classes as well as learning the representations of known classes generalizable to novel ones.
arXiv Detail & Related papers (2021-06-29T11:51:57Z) - Few-shot Learning for Multi-label Intent Detection [59.66787898744991]
State-of-the-art work estimates label-instance relevance scores and uses a threshold to select multiple associated intent labels.
Experiments on two datasets show that the proposed model significantly outperforms strong baselines in both one-shot and five-shot settings.
arXiv Detail & Related papers (2020-10-11T14:42:18Z) - Learning Image Labels On-the-fly for Training Robust Classification
Models [13.669654965671604]
We show how noisy annotations (e.g., from different algorithm-based labelers) can be utilized together and mutually benefit the learning of classification tasks.
A meta-training based label-sampling module is designed to attend the labels that benefit the model learning the most through additional back-propagation processes.
arXiv Detail & Related papers (2020-09-22T05:38:44Z) - Knowledge-Guided Multi-Label Few-Shot Learning for General Image
Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks.
It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence.
Then, it introduces the label semantics to guide learning semantic-specific features.
It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.