PatchCT: Aligning Patch Set and Label Set with Conditional Transport for
Multi-Label Image Classification
- URL: http://arxiv.org/abs/2307.09066v2
- Date: Fri, 18 Aug 2023 11:53:27 GMT
- Title: PatchCT: Aligning Patch Set and Label Set with Conditional Transport for
Multi-Label Image Classification
- Authors: Miaoge Li, Dongsheng Wang, Xinyang Liu, Zequn Zeng, Ruiying Lu, Bo
Chen, Mingyuan Zhou
- Abstract summary: Multi-label image classification is a prediction task that aims to identify more than one label from a given image.
This paper introduces the conditional transport theory to bridge the acknowledged gap.
We find that by formulating the multi-label classification as a CT problem, we can exploit the interactions between the image and label efficiently.
- Score: 48.929583521641526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-label image classification is a prediction task that aims to identify
more than one label from a given image. This paper considers the semantic
consistency of the latent space between the visual patch and linguistic label
domains and introduces the conditional transport (CT) theory to bridge the
acknowledged gap. While recent cross-modal attention-based studies have
attempted to align such two representations and achieved impressive
performance, they required carefully-designed alignment modules and extra
complex operations in the attention computation. We find that by formulating
the multi-label classification as a CT problem, we can exploit the interactions
between the image and label efficiently by minimizing the bidirectional CT
cost. Specifically, after feeding the images and textual labels into the
modality-specific encoders, we view each image as a mixture of patch embeddings
and a mixture of label embeddings, which capture the local region features and
the class prototypes, respectively. CT is then employed to learn and align
those two semantic sets by defining the forward and backward navigators.
Importantly, the defined navigators in CT distance model the similarities
between patches and labels, which provides an interpretable tool to visualize
the learned prototypes. Extensive experiments on three public image benchmarks
show that the proposed model consistently outperforms the previous methods.
Related papers
- Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - Semantic-Aware Graph Matching Mechanism for Multi-Label Image
Recognition [21.36538164675385]
Multi-label image recognition aims to predict a set of labels that present in an image.
In this paper, we treat each image as a bag of instances, and formulate the task of multi-label image recognition as an instance-label matching selection problem.
We propose an innovative Semantic-aware Graph Matching framework for Multi-Label image recognition (ML-SGM)
arXiv Detail & Related papers (2023-04-21T23:48:01Z) - Dual-Perspective Semantic-Aware Representation Blending for Multi-Label
Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images.
The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z) - Heterogeneous Semantic Transfer for Multi-label Recognition with Partial Labels [70.45813147115126]
Multi-label image recognition with partial labels (MLR-PL) may greatly reduce the cost of annotation and thus facilitate large-scale MLR.
We find that strong semantic correlations exist within each image and across different images.
These correlations can help transfer the knowledge possessed by the known labels to retrieve the unknown labels.
arXiv Detail & Related papers (2022-05-23T08:37:38Z) - Semantic-Aware Representation Blending for Multi-Label Image Recognition
with Partial Labels [86.17081952197788]
We propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels.
Experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors.
arXiv Detail & Related papers (2022-03-04T07:56:16Z) - Structured Semantic Transfer for Multi-Label Recognition with Partial
Labels [85.6967666661044]
We propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels.
The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations.
Experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms.
arXiv Detail & Related papers (2021-12-21T02:15:01Z) - Inferring Prototypes for Multi-Label Few-Shot Image Classification with
Word Vector Guided Attention [45.6809084493491]
Multi-label few-shot image classification (ML-FSIC) is the task of assigning descriptive labels to previously unseen images.
In this paper we propose to use word embeddings as a form of prior knowledge about the meaning of the labels.
Our model can infer prototypes for unseen labels without the need for fine-tuning any model parameters.
arXiv Detail & Related papers (2021-12-02T07:59:11Z) - Reconstruction Regularized Deep Metric Learning for Multi-label Image
Classification [39.055689258395624]
We present a novel deep metric learning method to tackle the multi-label image classification problem.
Our model can be trained in an end-to-end manner.
arXiv Detail & Related papers (2020-07-27T13:28:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.