MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
- URL: http://arxiv.org/abs/2405.17730v1
- Date: Tue, 28 May 2024 01:19:13 GMT
- Title: MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
- Authors: Yake Wei, Di Hu,
- Abstract summary: We identify the previously ignored gradient conflict between multimodal and unimodal learning objectives.
We propose MMPareto algorithm, which could ensure a final gradient with direction common to all learning objectives.
Our method is also expected to facilitate multi-task cases with a clear discrepancy in task difficulty.
- Score: 10.580712937465032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal learning methods with targeted unimodal learning objectives have exhibited their superior efficacy in alleviating the imbalanced multimodal learning problem. However, in this paper, we identify the previously ignored gradient conflict between multimodal and unimodal learning objectives, potentially misleading the unimodal encoder optimization. To well diminish these conflicts, we observe the discrepancy between multimodal loss and unimodal loss, where both gradient magnitude and covariance of the easier-to-learn multimodal loss are smaller than the unimodal one. With this property, we analyze Pareto integration under our multimodal scenario and propose MMPareto algorithm, which could ensure a final gradient with direction that is common to all learning objectives and enhanced magnitude to improve generalization, providing innocent unimodal assistance. Finally, experiments across multiple types of modalities and frameworks with dense cross-modal interaction indicate our superior and extendable method performance. Our method is also expected to facilitate multi-task cases with a clear discrepancy in task difficulty, demonstrating its ideal scalability. The source code and dataset are available at https://github.com/GeWu-Lab/MMPareto_ICML2024.
Related papers
- On the Comparison between Multi-modal and Single-modal Contrastive Learning [50.74988548106031]
We introduce a theoretical foundation for understanding the differences between multi-modal and single-modal contrastive learning.
We identify the critical factor, which is the signal-to-noise ratio (SNR), that impacts the generalizability in downstream tasks of both multi-modal and single-modal contrastive learning.
Our analysis provides a unified framework that can characterize the optimization and generalization of both single-modal and multi-modal contrastive learning.
arXiv Detail & Related papers (2024-11-05T06:21:17Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Modality-Balanced Learning for Multimedia Recommendation [21.772064939915214]
We propose a Counterfactual Knowledge Distillation method to solve the imbalance problem and make the best use of all modalities.
We also design a novel generic-and-specific distillation loss to guide the multimodal student to learn wider-and-deeper knowledge from teachers.
Our method could serve as a plug-and-play module for both late-fusion and early-fusion backbones.
arXiv Detail & Related papers (2024-07-26T07:53:01Z) - ReconBoost: Boosting Can Achieve Modality Reconcilement [89.4377895465204]
We study the modality-alternating learning paradigm to achieve reconcilement.
We propose a new method called ReconBoost to update a fixed modality each time.
We show that the proposed method resembles Friedman's Gradient-Boosting (GB) algorithm, where the updated learner can correct errors made by others.
arXiv Detail & Related papers (2024-05-15T13:22:39Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Deep Metric Loss for Multimodal Learning [3.8979646385036175]
We introduce a novel textMultiModal loss paradigm for multimodal learning.
textMultiModal loss can prevent inefficient learning caused by overfitting and efficiently optimize multimodal models.
Our loss is empirically shown to improve the performance of recent models.
arXiv Detail & Related papers (2023-08-21T06:04:30Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - UniS-MMC: Multimodal Classification via Unimodality-supervised
Multimodal Contrastive Learning [29.237813880311943]
We propose a novel multimodal contrastive method to explore more reliable multimodal representations under the weak supervision of unimodal predicting.
Experimental results with fused features on two image-text classification benchmarks show that our proposed Unimodality-Supervised MultiModal Contrastive UniS-MMC learning method outperforms current state-of-the-art multimodal methods.
arXiv Detail & Related papers (2023-05-16T09:18:38Z) - Improving Multi-Modal Learning with Uni-Modal Teachers [14.917618203952479]
We propose a new multi-modal learning method, Uni-Modal Teacher, which combines the fusion objective and uni-modal distillation to tackle the modality failure problem.
We show that our method not only drastically improves the representation of each modality, but also improves the overall multi-modal task performance.
arXiv Detail & Related papers (2021-06-21T12:46:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.