MICINet: Multi-Level Inter-Class Confusing Information Removal for Reliable Multimodal Classification
- URL: http://arxiv.org/abs/2502.19674v1
- Date: Thu, 27 Feb 2025 01:33:28 GMT
- Title: MICINet: Multi-Level Inter-Class Confusing Information Removal for Reliable Multimodal Classification
- Authors: Tong Zhang, Shu Shen, C. L. Philip Chen,
- Abstract summary: A reliable multimodal classification method dubbed Multi-Level Inter-Class Confusing Information Removal Network (MICINet) is proposed.<n>MICINet achieves the reliable removal of both types of noise by unifying them into the concept of Inter-class Confusing Information (textitICI) and eliminating it at both global and individual levels.<n>Experiments on four datasets demonstrate that MICINet outperforms other state-of-the-art reliable multimodal classification methods under various noise conditions.
- Score: 57.08108545219043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reliable multimodal learning in the presence of noisy data is a widely concerned issue, especially in safety-critical applications. Many reliable multimodal methods delve into addressing modality-specific or cross-modality noise. However, they fail to handle the coexistence of both types of noise efficiently. Moreover, the lack of comprehensive consideration for noise at both global and individual levels limits their reliability. To address these issues, a reliable multimodal classification method dubbed Multi-Level Inter-Class Confusing Information Removal Network (MICINet) is proposed. MICINet achieves the reliable removal of both types of noise by unifying them into the concept of Inter-class Confusing Information (\textit{ICI}) and eliminating it at both global and individual levels. Specifically, MICINet first reliably learns the global \textit{ICI} distribution through the proposed \textbf{\textit{Global \textbf{ICI} Learning Module}}. Then, it introduces the \textbf{\textit{Global-guided Sample ICI Learning module}} to efficiently remove global-level \textit{ICI} from sample features utilizing the learned global \textit{ICI} distribution. Subsequently, the \textbf{\textit{Sample-adaptive Cross-modality Information Compensation module}} is designed to remove individual-level \textit{ICI} from each sample reliably. This is achieved through interpretable cross-modality information compensation based on the complementary relationship between discriminative features and \textit{ICI} and the perception of the relative quality of modalities introduced by the relative discriminative power. Experiments on four datasets demonstrate that MICINet outperforms other state-of-the-art reliable multimodal classification methods under various noise conditions.
Related papers
- Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises [3.7816957214446103]
Test-Time Adaptation (TTA) aims to tackle distribution shifts using unlabeled test data without access to the source data.
Existing TTA methods fail in such multimodal scenario because the abrupt distribution shifts will destroy the prior knowledge from the source model.
We propose two novel strategies: sample identification with interquartile range Smoothing and unimodal assistance, and Mutual information sharing.
arXiv Detail & Related papers (2025-03-04T13:36:16Z) - QADM-Net: Multi-Level Quality-Adaptive Dynamic Network for Reliable Multimodal Classification [57.08108545219043]
Current multimodal classification methods lack dynamic networks for sample-specific depth and parameters to achieve reliable inference.<n>We propose Multi-Level Quality-Adaptive Dynamic Multimodal Network (QADM-Net)<n>Experiments conducted on four datasets demonstrate that QADM-Net significantly outperforms state-of-the-art methods in classification performance and reliability.
arXiv Detail & Related papers (2024-12-19T03:26:51Z) - CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network [53.72046586512026]
We propose a novel incomplete multi-view clustering network, called Cognitive Deep Incomplete Multi-view Clustering Network (CDIMC-net)
It captures the high-level features and local structure of each view by incorporating the view-specific deep encoders and graph embedding strategy into a framework.
Based on the human cognition, i.e., learning from easy to hard, it introduces a self-paced strategy to select the most confident samples for model training.
arXiv Detail & Related papers (2024-03-28T15:45:03Z) - A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels [22.2715520667186]
Cross-modal retrieval (CMR) aims to establish interaction between different modalities.
This work proposes UOT-RCL, a Unified framework based on Optimal Transport (OT) for Robust Cross-modal Retrieval.
Experiments on three widely-used cross-modal retrieval datasets demonstrate that our UOT-RCL surpasses the state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-20T10:34:40Z) - Factorized Contrastive Learning: Going Beyond Multi-view Redundancy [116.25342513407173]
This paper proposes FactorCL, a new multimodal representation learning method to go beyond multi-view redundancy.
On large-scale real-world datasets, FactorCL captures both shared and unique information and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-06-08T15:17:04Z) - Generalized Product-of-Experts for Learning Multimodal Representations
in Noisy Environments [18.14974353615421]
We propose a novel method for multimodal representation learning in a noisy environment via the generalized product of experts technique.
In the proposed method, we train a separate network for each modality to assess the credibility of information coming from that modality.
We attain state-of-the-art performance on two challenging benchmarks: multimodal 3D hand-pose estimation and multimodal surgical video segmentation.
arXiv Detail & Related papers (2022-11-07T14:27:38Z) - Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal
and Multimodal Representations [27.855467591358018]
We introduce the multimodal information bottleneck (MIB), aiming to learn a powerful and sufficient multimodal representation.
We develop three MIB variants, namely, early-fusion MIB, late-fusion MIB, and complete MIB, to focus on different perspectives of information constraints.
Experimental results suggest that the proposed method reaches state-of-the-art performance on the tasks of multimodal sentiment analysis and multimodal emotion recognition.
arXiv Detail & Related papers (2022-10-31T16:14:18Z) - Efficient Multimodal Transformer with Dual-Level Feature Restoration for
Robust Multimodal Sentiment Analysis [47.29528724322795]
Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently.
Despite significant progress, there are still two major challenges on the way towards robust MSA.
We propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR)
arXiv Detail & Related papers (2022-08-16T08:02:30Z) - A Multi-level Supervised Contrastive Learning Framework for Low-Resource
Natural Language Inference [54.678516076366506]
Natural Language Inference (NLI) is a growingly essential task in natural language understanding.
Here we propose a multi-level supervised contrastive learning framework named MultiSCL for low-resource natural language inference.
arXiv Detail & Related papers (2022-05-31T05:54:18Z) - Which is Making the Contribution: Modulating Unimodal and Cross-modal
Dynamics for Multimodal Sentiment Analysis [18.833050804875032]
Multimodal sentiment analysis (MSA) draws increasing attention with the availability of multimodal data.
Recent MSA works mostly focus on learning cross-modal dynamics, but neglect to explore an optimal solution for unimodal networks.
We propose a novel MSA framework textbfModulation textbfModel for textbfMultimodal textbfSentiment textbfAnalysis.
arXiv Detail & Related papers (2021-11-10T03:29:17Z) - Seeking the Shape of Sound: An Adaptive Framework for Learning
Voice-Face Association [94.7030305679589]
We propose a novel framework to jointly address the above-mentioned issues.
We introduce a global loss into the modality alignment process.
The proposed method outperforms the previous methods in multiple settings.
arXiv Detail & Related papers (2021-03-12T14:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.