Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion
- URL: http://arxiv.org/abs/2502.20120v1
- Date: Thu, 27 Feb 2025 14:12:20 GMT
- Title: Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion
- Authors: QingYuan Jiang, Longfei Huang, Yang Yang,
- Abstract summary: The existence of modality imbalance hinders multimodal learning from achieving its expected superiority over unimodal models in practice.<n>By designing a sustained boosting algorithm, we propose a novel multimodal learning approach to balance the classification ability of weak and strong modalities.
- Score: 6.621745547882088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although multimodal learning~(MML) has garnered remarkable progress, the existence of modality imbalance hinders multimodal learning from achieving its expected superiority over unimodal models in practice. To overcome this issue, mainstream multimodal learning methods have placed greater emphasis on balancing the learning process. However, these approaches do not explicitly enhance the classification ability of weaker modalities, leading to limited performance promotion. By designing a sustained boosting algorithm, we propose a novel multimodal learning approach to dynamically balance the classification ability of weak and strong modalities. Concretely, we first propose a sustained boosting algorithm in multimodal learning by simultaneously optimizing the classification and residual errors using a designed configurable classifier module. Then, we propose an adaptive classifier assignment strategy to dynamically facilitate the classification performance of weak modality. To this end, the classification ability of strong and weak modalities is expected to be balanced, thereby mitigating the imbalance issue. Empirical experiments on widely used datasets reveal the superiority of our method through comparison with various state-of-the-art~(SoTA) multimodal learning baselines.
Related papers
- Generative Modeling of Class Probability for Multi-Modal Representation Learning [7.5696616045063845]
Multi-modal understanding plays a crucial role in artificial intelligence by enabling models to jointly interpret inputs from different modalities.
We propose a novel class anchor alignment approach that leverages class probability distributions for multi-modal representation learning.
Our method, Class-anchor-ALigned generative Modeling (CALM), encodes class anchors as prompts to generate and align class probability distributions for each modality.
arXiv Detail & Related papers (2025-03-21T01:17:44Z) - Asymmetric Reinforcing against Multi-modal Representation Bias [59.685072206359855]
We propose an Asymmetric Reinforcing method against Multimodal representation bias (ARM)<n>Our ARM dynamically reinforces the weak modalities while maintaining the ability to represent dominant modalities through conditional mutual information.<n>We have significantly improved the performance of multimodal learning, making notable progress in mitigating imbalanced multimodal learning.
arXiv Detail & Related papers (2025-01-02T13:00:06Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Diagnosing and Re-learning for Balanced Multimodal Learning [8.779005254634857]
We propose the Diagnosing & Re-learning method to overcome the imbalanced multimodal learning problem.
The learning state of each modality is estimated based on the separability of its uni-modal representation space.
In this way, the over-emphasizing of scarcely informative modalities is avoided.
arXiv Detail & Related papers (2024-07-12T22:12:03Z) - Multimodal Classification via Modal-Aware Interactive Enhancement [6.621745547882088]
We propose a novel multimodal learning method, called modal-aware interactive enhancement (MIE)
Specifically, we first utilize an optimization strategy based on sharpness aware minimization (SAM) to smooth the learning objective during the forward phase.
Then, with the help of the geometry property of SAM, we propose a gradient modification strategy to impose the influence between different modalities during the backward phase.
arXiv Detail & Related papers (2024-07-05T15:32:07Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Towards Balanced Active Learning for Multimodal Classification [15.338417969382212]
Training multimodal networks requires a vast amount of data due to their larger parameter space compared to unimodal networks.
Active learning is a widely used technique for reducing data annotation costs by selecting only those samples that could contribute to improving model performance.
Current active learning strategies are mostly designed for unimodal tasks, and when applied to multimodal data, they often result in biased sample selection from the dominant modality.
arXiv Detail & Related papers (2023-06-14T07:23:36Z) - Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems.
In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z) - MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition.
We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL)
In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.