Balanced Multimodal Learning via On-the-fly Gradient Modulation
- URL: http://arxiv.org/abs/2203.15332v1
- Date: Tue, 29 Mar 2022 08:26:38 GMT
- Title: Balanced Multimodal Learning via On-the-fly Gradient Modulation
- Authors: Xiaokang Peng, Yake Wei, Andong Deng, Dong Wang and Di Hu
- Abstract summary: Multimodal learning helps to comprehensively understand the world, by integrating different senses.
We propose on-the-fly gradient modulation to adaptively control the optimization of each modality, via monitoring the discrepancy of their contribution towards the learning objective.
- Score: 10.5602074277814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal learning helps to comprehensively understand the world, by
integrating different senses. Accordingly, multiple input modalities are
expected to boost model performance, but we actually find that they are not
fully exploited even when the multimodal model outperforms its uni-modal
counterpart. Specifically, in this paper we point out that existing multimodal
discriminative models, in which uniform objective is designed for all
modalities, could remain under-optimized uni-modal representations, caused by
another dominated modality in some scenarios, e.g., sound in blowing wind
event, vision in drawing picture event, etc. To alleviate this optimization
imbalance, we propose on-the-fly gradient modulation to adaptively control the
optimization of each modality, via monitoring the discrepancy of their
contribution towards the learning objective. Further, an extra Gaussian noise
that changes dynamically is introduced to avoid possible generalization drop
caused by gradient modulation. As a result, we achieve considerable improvement
over common fusion methods on different multimodal tasks, and this simple
strategy can also boost existing multimodal methods, which illustrates its
efficacy and versatility. The source code is available at
\url{https://github.com/GeWu-Lab/OGM-GE_CVPR2022}.
Related papers
- On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Modality-Balanced Learning for Multimedia Recommendation [21.772064939915214]
We propose a Counterfactual Knowledge Distillation method to solve the imbalance problem and make the best use of all modalities.
We also design a novel generic-and-specific distillation loss to guide the multimodal student to learn wider-and-deeper knowledge from teachers.
Our method could serve as a plug-and-play module for both late-fusion and early-fusion backbones.
arXiv Detail & Related papers (2024-07-26T07:53:01Z) - Multimodal Classification via Modal-Aware Interactive Enhancement [6.621745547882088]
We propose a novel multimodal learning method, called modal-aware interactive enhancement (MIE)
Specifically, we first utilize an optimization strategy based on sharpness aware minimization (SAM) to smooth the learning objective during the forward phase.
Then, with the help of the geometry property of SAM, we propose a gradient modification strategy to impose the influence between different modalities during the backward phase.
arXiv Detail & Related papers (2024-07-05T15:32:07Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Improving Multi-Modal Learning with Uni-Modal Teachers [14.917618203952479]
We propose a new multi-modal learning method, Uni-Modal Teacher, which combines the fusion objective and uni-modal distillation to tackle the modality failure problem.
We show that our method not only drastically improves the representation of each modality, but also improves the overall multi-modal task performance.
arXiv Detail & Related papers (2021-06-21T12:46:47Z) - Optimization-Inspired Learning with Architecture Augmentations and
Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles.
We construct three propagative modules to effectively solve the optimization models with flexible combinations.
Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.