Inferring Prototypes for Multi-Label Few-Shot Image Classification with
Word Vector Guided Attention
- URL: http://arxiv.org/abs/2112.01037v1
- Date: Thu, 2 Dec 2021 07:59:11 GMT
- Title: Inferring Prototypes for Multi-Label Few-Shot Image Classification with
Word Vector Guided Attention
- Authors: Kun Yan, Chenbin Zhang, Jun Hou, Ping Wang, Zied Bouraoui, Shoaib
Jameel, Steven Schockaert
- Abstract summary: Multi-label few-shot image classification (ML-FSIC) is the task of assigning descriptive labels to previously unseen images.
In this paper we propose to use word embeddings as a form of prior knowledge about the meaning of the labels.
Our model can infer prototypes for unseen labels without the need for fine-tuning any model parameters.
- Score: 45.6809084493491
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-label few-shot image classification (ML-FSIC) is the task of assigning
descriptive labels to previously unseen images, based on a small number of
training examples. A key feature of the multi-label setting is that images
often have multiple labels, which typically refer to different regions of the
image. When estimating prototypes, in a metric-based setting, it is thus
important to determine which regions are relevant for which labels, but the
limited amount of training data makes this highly challenging. As a solution,
in this paper we propose to use word embeddings as a form of prior knowledge
about the meaning of the labels. In particular, visual prototypes are obtained
by aggregating the local feature maps of the support images, using an attention
mechanism that relies on the label embeddings. As an important advantage, our
model can infer prototypes for unseen labels without the need for fine-tuning
any model parameters, which demonstrates its strong generalization abilities.
Experiments on COCO and PASCAL VOC furthermore show that our model
substantially improves the current state-of-the-art.
Related papers
- PatchCT: Aligning Patch Set and Label Set with Conditional Transport for
Multi-Label Image Classification [48.929583521641526]
Multi-label image classification is a prediction task that aims to identify more than one label from a given image.
This paper introduces the conditional transport theory to bridge the acknowledged gap.
We find that by formulating the multi-label classification as a CT problem, we can exploit the interactions between the image and label efficiently.
arXiv Detail & Related papers (2023-07-18T08:37:37Z) - Distilling Self-Supervised Vision Transformers for Weakly-Supervised
Few-Shot Classification & Segmentation [58.03255076119459]
We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT)
Our proposed method takes token representations from the self-supervised ViT and leverages their correlations, via self-attention, to produce classification and segmentation predictions.
Experiments on Pascal-5i and COCO-20i demonstrate significant performance gains in a variety of supervision settings.
arXiv Detail & Related papers (2023-07-07T06:16:43Z) - A Deep Model for Partial Multi-Label Image Classification with Curriculum Based Disambiguation [42.0958430465578]
We study the partial multi-label (PML) image classification problem.
Existing PML methods typically design a disambiguation strategy to filter out noisy labels.
We propose a deep model for PML to enhance the representation and discrimination ability.
arXiv Detail & Related papers (2022-07-06T02:49:02Z) - Dual-Perspective Semantic-Aware Representation Blending for Multi-Label
Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images.
The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z) - Semantic-Aware Representation Blending for Multi-Label Image Recognition
with Partial Labels [86.17081952197788]
We propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels.
Experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors.
arXiv Detail & Related papers (2022-03-04T07:56:16Z) - Structured Semantic Transfer for Multi-Label Recognition with Partial
Labels [85.6967666661044]
We propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels.
The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations.
Experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms.
arXiv Detail & Related papers (2021-12-21T02:15:01Z) - SCIDA: Self-Correction Integrated Domain Adaptation from Single- to
Multi-label Aerial Images [30.12949142271464]
Most publicly available datasets for image classification are with single labels, while images are inherently multi-labeled in our daily life.
We propose a novel integrated domain adaptation (SCIDA) method for automatic multi-label learning.
SCIDA is weakly supervised, i.e., automatically learning the multi-label image classification model from using massive, publicly available single-label images.
arXiv Detail & Related papers (2021-08-15T20:38:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.