Related papers: Semantic-Aware Representation Learning via Conditional Transport for Multi-Label Image Classification

Semantic-Aware Representation Learning via Conditional Transport for Multi-Label Image Classification

URL: http://arxiv.org/abs/2507.14918v2
Date: Sun, 02 Nov 2025 13:11:41 GMT
Title: Semantic-Aware Representation Learning via Conditional Transport for Multi-Label Image Classification
Authors: Ren-Dong Xie, Zhi-Fen He, Bo Li, Bin Liu, Jin-Yan Hu,
Abstract summary: This paper proposes a novel approach named Semantic-aware representation learning via Conditional Transport for Multi-Label Image Classification (SCT)<n>The proposed method introduces a semantic-related feature learning module that extracts discriminative label-specific features by emphasizing semantic relevance and interaction.<n>Experiments on two widely-used benchmark datasets, VOC2007 and MS-COCO, validate the effectiveness of SCT and demonstrate its superior performance compared to existing state-of-the-art methods.
Score: 8.864897133482907
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-label image classification is a critical task in machine learning that aims to accurately assign multiple labels to a single image. While existing methods often utilize attention mechanisms or graph convolutional networks to model visual representations, their performance is still constrained by two critical limitations: the inability to learn discriminative semantic-aware features, and the lack of fine-grained alignment between visual representations and label embeddings. To tackle these issues in a unified framework, this paper proposes a novel approach named Semantic-aware representation learning via Conditional Transport for Multi-Label Image Classification (SCT). The proposed method introduces a semantic-related feature learning module that extracts discriminative label-specific features by emphasizing semantic relevance and interaction, along with a conditional transport-based alignment mechanism that enables precise visual-semantic alignment. Extensive experiments on two widely-used benchmark datasets, VOC2007 and MS-COCO, validate the effectiveness of SCT and demonstrate its superior performance compared to existing state-of-the-art methods.

Related papers

Hierarchical Semantic Alignment for Image Clustering [59.277605709780524]
We propose a hierarChical semAntic alignmEnt method for image clustering, dubbed CAE, which improves cluster- ing performance in a training-free manner.<n>We first select relevant nouns from WordNet and descriptions from caption datasets to construct a semantic space aligned with image features.<n>Then, we align image features with selected nouns and captions via optimal transport to obtain a more discriminative semantic space.
arXiv Detail & Related papers (2025-11-30T14:14:51Z)
Collaborative Learning of Semantic-Aware Feature Learning and Label Recovery for Multi-Label Image Recognition with Incomplete Labels [8.864897133482907]
We propose a novel Collaborative Learning of Semantic-aware feature learning and Label recovery method.<n>We show that CLSL outperforms the state-of-the-art multi-label image recognition methods with incomplete labels.
arXiv Detail & Related papers (2025-10-11T06:43:43Z)
Semantic-guided Representation Learning for Multi-Label Recognition [13.046479112800608]
Multi-label Recognition (MLR) involves assigning multiple labels to each data instance in an image.<n>Recent Vision and Language Pre-training methods have made significant progress in tackling zero-shot MLR tasks.<n>We introduce a Semantic-guided Representation Learning approach (SigRL) that enables the model to learn effective visual and textual representations.
arXiv Detail & Related papers (2025-04-04T08:15:08Z)
Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps. We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z)
Multi-Label Self-Supervised Learning with Scene Images [21.549234013998255]
This paper shows that quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem. The proposed method is named Multi-Label Self-supervised learning (MLS)
arXiv Detail & Related papers (2023-08-07T04:04:22Z)
DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations [79.433122872973]
Multi-label image recognition in the low-label regime is a task of great challenge and practical significance. We leverage the powerful alignment between textual and visual features pretrained with millions of auxiliary image-text pairs. We introduce an efficient and effective framework called Evidence-guided Dual Context Optimization (DualCoOp++)
arXiv Detail & Related papers (2023-08-03T17:33:20Z)
PatchCT: Aligning Patch Set and Label Set with Conditional Transport for Multi-Label Image Classification [48.929583521641526]
Multi-label image classification is a prediction task that aims to identify more than one label from a given image. This paper introduces the conditional transport theory to bridge the acknowledged gap. We find that by formulating the multi-label classification as a CT problem, we can exploit the interactions between the image and label efficiently.
arXiv Detail & Related papers (2023-07-18T08:37:37Z)
Incremental Image Labeling via Iterative Refinement [4.7590051176368915]
In particular, the existence of the semantic gap problem leads to a many-to-many mapping between the information extracted from an image and its linguistic description. This unavoidable bias further leads to poor performance on current computer vision tasks. We introduce a Knowledge Representation (KR)-based methodology to provide guidelines driving the labeling process.
arXiv Detail & Related papers (2023-04-18T13:37:22Z)
Dual-Perspective Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images. The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z)
Semantic Representation and Dependency Learning for Multi-Label Image Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category. Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model. We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z)
Multi-Label Image Classification with Contrastive Learning [57.47567461616912]
We show that a direct application of contrastive learning can hardly improve in multi-label cases. We propose a novel framework for multi-label classification with contrastive learning in a fully supervised setting.
arXiv Detail & Related papers (2021-07-24T15:00:47Z)
Multi-layered Semantic Representation Network for Multi-label Image Classification [8.17894017454724]
Multi-label image classification (MLIC) is a fundamental and practical task, which aims to assign multiple possible labels to an image. In recent years, many deep convolutional neural network (CNN) based approaches have been proposed which model label correlations. This paper advances this research direction by improving the modeling of label correlations and the learning of semantic representations.
arXiv Detail & Related papers (2021-06-22T08:04:22Z)
Semantic Diversity Learning for Zero-Shot Multi-label Classification [14.480713752871523]
This study introduces an end-to-end model training for multi-label zero-shot learning. We propose to use an embedding matrix having principal embedding vectors trained using a tailored loss function. In addition, during training, we suggest up-weighting in the loss function image samples presenting higher semantic diversity to encourage the diversity of the embedding matrix.
arXiv Detail & Related papers (2021-05-12T19:39:07Z)
Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks. It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence. Then, it introduces the label semantics to guide learning semantic-specific features. It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z)
Zero-Shot Recognition through Image-Guided Semantic Classification [9.291055558504588]
We present a new embedding-based framework for zero-shot learning (ZSL) Motivated by the binary relevance method for multi-label classification, we propose to inversely learn the mapping between an image and a semantic classifier. IGSC is conceptually simple and can be realized by a slight enhancement of an existing deep architecture for classification.
arXiv Detail & Related papers (2020-07-23T06:22:40Z)
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences. In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference. Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
Hierarchical Image Classification using Entailment Cone Embeddings [68.82490011036263]
We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier. We empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
arXiv Detail & Related papers (2020-04-02T10:22:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.