Towards Balanced Active Learning for Multimodal Classification
- URL: http://arxiv.org/abs/2306.08306v2
- Date: Mon, 21 Aug 2023 05:26:45 GMT
- Title: Towards Balanced Active Learning for Multimodal Classification
- Authors: Meng Shen, Yizheng Huang, Jianxiong Yin, Heqing Zou, Deepu Rajan,
Simon See
- Abstract summary: Training multimodal networks requires a vast amount of data due to their larger parameter space compared to unimodal networks.
Active learning is a widely used technique for reducing data annotation costs by selecting only those samples that could contribute to improving model performance.
Current active learning strategies are mostly designed for unimodal tasks, and when applied to multimodal data, they often result in biased sample selection from the dominant modality.
- Score: 15.338417969382212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training multimodal networks requires a vast amount of data due to their
larger parameter space compared to unimodal networks. Active learning is a
widely used technique for reducing data annotation costs by selecting only
those samples that could contribute to improving model performance. However,
current active learning strategies are mostly designed for unimodal tasks, and
when applied to multimodal data, they often result in biased sample selection
from the dominant modality. This unfairness hinders balanced multimodal
learning, which is crucial for achieving optimal performance. To address this
issue, we propose three guidelines for designing a more balanced multimodal
active learning strategy. Following these guidelines, a novel approach is
proposed to achieve more fair data selection by modulating the gradient
embedding with the dominance degree among modalities. Our studies demonstrate
that the proposed method achieves more balanced multimodal learning by avoiding
greedy sample selection from the dominant modality. Our approach outperforms
existing active learning strategies on a variety of multimodal classification
tasks. Overall, our work highlights the importance of balancing sample
selection in multimodal active learning and provides a practical solution for
achieving more balanced active learning for multimodal classification.
Related papers
- DeepSuM: Deep Sufficient Modality Learning Framework [6.455939667961427]
We propose a novel framework for modality selection that independently learns the representation of each modality.
Our framework aims to enhance the efficiency and effectiveness of multimodal learning by optimizing modality integration and selection.
arXiv Detail & Related papers (2025-03-03T16:48:59Z) - Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion [6.621745547882088]
The existence of modality imbalance hinders multimodal learning from achieving its expected superiority over unimodal models in practice.
By designing a sustained boosting algorithm, we propose a novel multimodal learning approach to balance the classification ability of weak and strong modalities.
arXiv Detail & Related papers (2025-02-27T14:12:20Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework [58.362064122489166]
This paper introduces the Cross-modal Few-Shot Learning task, which aims to recognize instances from multiple modalities when only a few labeled examples are available.
We propose a Generative Transfer Learning framework consisting of two stages: the first involves training on abundant unimodal data, and the second focuses on transfer learning to adapt to novel data.
Our finds demonstrate that GTL has superior performance compared to state-of-the-art methods across four distinct multi-modal datasets.
arXiv Detail & Related papers (2024-10-14T16:09:38Z) - Diagnosing and Re-learning for Balanced Multimodal Learning [8.779005254634857]
We propose the Diagnosing & Re-learning method to overcome the imbalanced multimodal learning problem.
The learning state of each modality is estimated based on the separability of its uni-modal representation space.
In this way, the over-emphasizing of scarcely informative modalities is avoided.
arXiv Detail & Related papers (2024-07-12T22:12:03Z) - Multimodal Classification via Modal-Aware Interactive Enhancement [6.621745547882088]
We propose a novel multimodal learning method, called modal-aware interactive enhancement (MIE)
Specifically, we first utilize an optimization strategy based on sharpness aware minimization (SAM) to smooth the learning objective during the forward phase.
Then, with the help of the geometry property of SAM, we propose a gradient modification strategy to impose the influence between different modalities during the backward phase.
arXiv Detail & Related papers (2024-07-05T15:32:07Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance [10.580712937465032]
We identify the previously ignored gradient conflict between multimodal and unimodal learning objectives.
We propose MMPareto algorithm, which could ensure a final gradient with direction common to all learning objectives.
Our method is also expected to facilitate multi-task cases with a clear discrepancy in task difficulty.
arXiv Detail & Related papers (2024-05-28T01:19:13Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Learning Modality-Specific Representations with Self-Supervised
Multi-Task Learning for Multimodal Sentiment Analysis [11.368438990334397]
We develop a self-supervised learning strategy to acquire independent unimodal supervisions.
We conduct extensive experiments on three public multimodal baseline datasets.
Our method achieves comparable performance than human-annotated unimodal labels.
arXiv Detail & Related papers (2021-02-09T14:05:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.