Class Similarity-Based Multimodal Classification under Heterogeneous Category Sets
- URL: http://arxiv.org/abs/2506.09745v1
- Date: Wed, 11 Jun 2025 13:49:22 GMT
- Title: Class Similarity-Based Multimodal Classification under Heterogeneous Category Sets
- Authors: Yangrui Zhu, Junhua Bao, Yipan Wei, Yapeng Li, Bo Du,
- Abstract summary: We propose the practical setting termed Multi-Modal Heterogeneous Category-set Learning (MMHCL)<n>Our method significantly outperforms existing state-of-the-art approaches on multiple benchmark datasets.
- Score: 22.03742325512164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing multimodal methods typically assume that different modalities share the same category set. However, in real-world applications, the category distributions in multimodal data exhibit inconsistencies, which can hinder the model's ability to effectively utilize cross-modal information for recognizing all categories. In this work, we propose the practical setting termed Multi-Modal Heterogeneous Category-set Learning (MMHCL), where models are trained in heterogeneous category sets of multi-modal data and aim to recognize complete classes set of all modalities during test. To effectively address this task, we propose a Class Similarity-based Cross-modal Fusion model (CSCF). Specifically, CSCF aligns modality-specific features to a shared semantic space to enable knowledge transfer between seen and unseen classes. It then selects the most discriminative modality for decision fusion through uncertainty estimation. Finally, it integrates cross-modal information based on class similarity, where the auxiliary modality refines the prediction of the dominant one. Experimental results show that our method significantly outperforms existing state-of-the-art (SOTA) approaches on multiple benchmark datasets, effectively addressing the MMHCL task.
Related papers
- MCFNet: A Multimodal Collaborative Fusion Network for Fine-Grained Semantic Classification [2.7936465461948945]
Multimodal Collaborative Fusion Network (MCFNet) designed for fine-grained classification.<n>MCFNet architecture incorporates a regularized integrated fusion module that improves intra-modal feature representation.<n> multimodal decision classification module exploits inter-modal correlations and unimodal discriminative features.
arXiv Detail & Related papers (2025-05-29T11:42:57Z) - Generative Modeling of Class Probability for Multi-Modal Representation Learning [7.5696616045063845]
Multi-modal understanding plays a crucial role in artificial intelligence by enabling models to jointly interpret inputs from different modalities.<n>We propose a novel class anchor alignment approach that leverages class probability distributions for multi-modal representation learning.<n>Our method, Class-anchor-ALigned generative Modeling (CALM), encodes class anchors as prompts to generate and align class probability distributions for each modality.
arXiv Detail & Related papers (2025-03-21T01:17:44Z) - Partially Supervised Unpaired Multi-Modal Learning for Label-Efficient Medical Image Segmentation [53.723234136550055]
We term the new learning paradigm as Partially Supervised Unpaired Multi-Modal Learning (PSUMML)<n>We propose a novel Decomposed partial class adaptation with snapshot Ensembled Self-Training (DEST) framework for it.<n>Our framework consists of a compact segmentation network with modality specific normalization layers for learning with partially labeled unpaired multi-modal data.
arXiv Detail & Related papers (2025-03-07T07:22:42Z) - Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition [59.203152078315235]
We propose a novel category-adaptive cross-modal semantic refinement and transfer (C$2$SRT) framework to explore the semantic correlation.<n>The proposed framework consists of two complementary modules, i.e., intra-category semantic refinement (ISR) module and inter-category semantic transfer (IST) module.<n>Experiments on OV-MLR benchmarks clearly demonstrate that the proposed C$2$SRT framework outperforms current state-of-the-art algorithms.
arXiv Detail & Related papers (2024-12-09T04:00:18Z) - Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only.
We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory.
A pool selection strategy is presented to build a solid ensemble classifier.
We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z) - An Evolutionary Approach for Creating of Diverse Classifier Ensembles [11.540822622379176]
We propose a framework for classifier selection and fusion based on a four-step protocol called CIF-E.
We implement and evaluate 24 varied ensemble approaches following the proposed CIF-E protocol.
Experiments show that the proposed evolutionary approach can outperform the state-of-the-art literature approaches in many well-known UCI datasets.
arXiv Detail & Related papers (2022-08-23T14:23:27Z) - A Similarity-based Framework for Classification Task [21.182406977328267]
Similarity-based method gives rise to a new class of methods for multi-label learning and also achieves promising performance.
We unite similarity-based learning and generalized linear models to achieve the best of both worlds.
arXiv Detail & Related papers (2022-03-05T06:39:50Z) - MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition.
We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL)
In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.