PMR: Prototypical Modal Rebalance for Multimodal Learning
- URL: http://arxiv.org/abs/2211.07089v1
- Date: Mon, 14 Nov 2022 03:36:05 GMT
- Title: PMR: Prototypical Modal Rebalance for Multimodal Learning
- Authors: Yunfeng Fan, Wenchao Xu, Haozhao Wang, Junxiao Wang, and Song Guo
- Abstract summary: We propose Prototypical Modality Rebalance (PMR) to perform stimulation on the particular slow-learning modality without interference from other modalities.
Our method only relies on the representations of each modality and without restrictions from model structures and fusion methods.
- Score: 11.5547414386921
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal learning (MML) aims to jointly exploit the common priors of
different modalities to compensate for their inherent limitations. However,
existing MML methods often optimize a uniform objective for different
modalities, leading to the notorious "modality imbalance" problem and
counterproductive MML performance. To address the problem, some existing
methods modulate the learning pace based on the fused modality, which is
dominated by the better modality and eventually results in a limited
improvement on the worse modal. To better exploit the features of multimodal,
we propose Prototypical Modality Rebalance (PMR) to perform stimulation on the
particular slow-learning modality without interference from other modalities.
Specifically, we introduce the prototypes that represent general features for
each class, to build the non-parametric classifiers for uni-modal performance
evaluation. Then, we try to accelerate the slow-learning modality by enhancing
its clustering toward prototypes. Furthermore, to alleviate the suppression
from the dominant modality, we introduce a prototype-based entropy
regularization term during the early training stage to prevent premature
convergence. Besides, our method only relies on the representations of each
modality and without restrictions from model structures and fusion methods,
making it with great application potential for various scenarios.
Related papers
- On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Multimodal Classification via Modal-Aware Interactive Enhancement [6.621745547882088]
We propose a novel multimodal learning method, called modal-aware interactive enhancement (MIE)
Specifically, we first utilize an optimization strategy based on sharpness aware minimization (SAM) to smooth the learning objective during the forward phase.
Then, with the help of the geometry property of SAM, we propose a gradient modification strategy to impose the influence between different modalities during the backward phase.
arXiv Detail & Related papers (2024-07-05T15:32:07Z) - Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization [14.606035444283984]
Current approaches focus on developing models that handle modality-incomplete inputs during inference.
We propose a robust universal model with modality reconstruction and model personalization.
Our method has been extensively validated on two brain tumor segmentation benchmarks.
arXiv Detail & Related papers (2024-06-04T06:07:24Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Generalizing Multimodal Variational Methods to Sets [35.69942798534849]
This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space.
By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization.
arXiv Detail & Related papers (2022-12-19T23:50:19Z) - Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.