Modelling Multi-modal Cross-interaction for ML-FSIC Based on Local Feature Selection
- URL: http://arxiv.org/abs/2412.13732v1
- Date: Wed, 18 Dec 2024 11:10:18 GMT
- Title: Modelling Multi-modal Cross-interaction for ML-FSIC Based on Local Feature Selection
- Authors: Kun Yan, Zied Bouraoui, Fangyun Wei, Chang Xu, Ping Wang, Shoaib Jameel, Steven Schockaert,
- Abstract summary: A key feature of the multi-label setting is that images often have several labels.
We propose a strategy in which label prototypes are gradually refined.
Experiments on COCO, PASCAL VOC, NUS-WIDE, and iMaterialist show that our model substantially improves the current state-of-the-art.
- Score: 55.144394711196924
- License:
- Abstract: The aim of multi-label few-shot image classification (ML-FSIC) is to assign semantic labels to images, in settings where only a small number of training examples are available for each label. A key feature of the multi-label setting is that images often have several labels, which typically refer to objects appearing in different regions of the image. When estimating label prototypes, in a metric-based setting, it is thus important to determine which regions are relevant for which labels, but the limited amount of training data and the noisy nature of local features make this highly challenging. As a solution, we propose a strategy in which label prototypes are gradually refined. First, we initialize the prototypes using word embeddings, which allows us to leverage prior knowledge about the meaning of the labels. Second, taking advantage of these initial prototypes, we then use a Loss Change Measurement~(LCM) strategy to select the local features from the training images (i.e.\ the support set) that are most likely to be representative of a given label. Third, we construct the final prototype of the label by aggregating these representative local features using a multi-modal cross-interaction mechanism, which again relies on the initial word embedding-based prototypes. Experiments on COCO, PASCAL VOC, NUS-WIDE, and iMaterialist show that our model substantially improves the current state-of-the-art.
Related papers
- PatchCT: Aligning Patch Set and Label Set with Conditional Transport for
Multi-Label Image Classification [48.929583521641526]
Multi-label image classification is a prediction task that aims to identify more than one label from a given image.
This paper introduces the conditional transport theory to bridge the acknowledged gap.
We find that by formulating the multi-label classification as a CT problem, we can exploit the interactions between the image and label efficiently.
arXiv Detail & Related papers (2023-07-18T08:37:37Z) - Learning Disentangled Label Representations for Multi-label
Classification [39.97251974500034]
One-shared-Feature-for-Multiple-Labels (OFML) is not conducive to learning discriminative label features.
We introduce the One-specific-Feature-for-One-Label (OFOL) mechanism and propose a novel disentangled label feature learning framework.
We achieve state-of-the-art performance on eight datasets.
arXiv Detail & Related papers (2022-12-02T21:49:34Z) - A Deep Model for Partial Multi-Label Image Classification with Curriculum Based Disambiguation [42.0958430465578]
We study the partial multi-label (PML) image classification problem.
Existing PML methods typically design a disambiguation strategy to filter out noisy labels.
We propose a deep model for PML to enhance the representation and discrimination ability.
arXiv Detail & Related papers (2022-07-06T02:49:02Z) - Dual-Perspective Semantic-Aware Representation Blending for Multi-Label
Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images.
The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z) - Semantic-Aware Representation Blending for Multi-Label Image Recognition
with Partial Labels [86.17081952197788]
We propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels.
Experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors.
arXiv Detail & Related papers (2022-03-04T07:56:16Z) - Structured Semantic Transfer for Multi-Label Recognition with Partial
Labels [85.6967666661044]
We propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels.
The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations.
Experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms.
arXiv Detail & Related papers (2021-12-21T02:15:01Z) - Inferring Prototypes for Multi-Label Few-Shot Image Classification with
Word Vector Guided Attention [45.6809084493491]
Multi-label few-shot image classification (ML-FSIC) is the task of assigning descriptive labels to previously unseen images.
In this paper we propose to use word embeddings as a form of prior knowledge about the meaning of the labels.
Our model can infer prototypes for unseen labels without the need for fine-tuning any model parameters.
arXiv Detail & Related papers (2021-12-02T07:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.