Related papers: Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization

Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization

URL: http://arxiv.org/abs/2511.06686v1
Date: Mon, 10 Nov 2025 04:16:01 GMT
Title: Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization
Authors: Heshan Fernando, Parikshit Ram, Yi Zhou, Soham Dan, Horst Samulowitz, Nathalie Baracaldo, Tianyi Chen,
Abstract summary: Multi-modal learning (MML) aims to integrate information from multiple modalities, which is expected to lead to superior performance over single-modality learning.<n>Recent studies have shown that MML can underperform, even compared to single-modality approaches, due to imbalanced learning across modalities.<n>We propose a gradient-based algorithm to solve the modified MML problem.
Score: 57.00656508727821
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-modal learning (MML) aims to integrate information from multiple modalities, which is expected to lead to superior performance over single-modality learning. However, recent studies have shown that MML can underperform, even compared to single-modality approaches, due to imbalanced learning across modalities. Methods have been proposed to alleviate this imbalance issue using different heuristics, which often lead to computationally intensive subroutines. In this paper, we reformulate the MML problem as a multi-objective optimization (MOO) problem that overcomes the imbalanced learning issue among modalities and propose a gradient-based algorithm to solve the modified MML problem. We provide convergence guarantees for the proposed method, and empirical evaluations on popular MML benchmarks showcasing the improved performance of the proposed method over existing balanced MML and MOO baselines, with up to ~20x reduction in subroutine computation time. Our code is available at https://github.com/heshandevaka/MIMO.

Related papers

Rebalanced Multimodal Learning with Data-aware Unimodal Sampling [39.77348232514481]
We propose a novel MML approach called underlineData-aware underlineUnimodal underlineSampling(method)<n>Based on the learning status, we propose a reinforcement learning(RL)-based data-aware unimodal sampling approaches.<n>Our method can be seamlessly incorporated into almost all existing multimodal learning approaches as a plugin.
arXiv Detail & Related papers (2025-03-05T08:19:31Z)
Balance-aware Sequence Sampling Makes Multi-modal Learning Better [0.5439020425819]
We propose Balance-aware Sequence Sampling (BSS) to enhance the robustness of MML.<n>Via a multi-perspective measurer, we first define a multi-perspective measurer to evaluate the balance degree of each sample.<n>We employ a scheduler based on curriculum learning (CL) that incrementally provides training subsets, progressing from balanced to imbalanced samples to rebalance MML.
arXiv Detail & Related papers (2025-01-01T06:19:55Z)
CoMMIT: Coordinated Multimodal Instruction Tuning [90.1532838391285]
Multimodal large language models (MLLMs) generally involve cooperative learning between a backbone LLM and a feature encoder of non-text input modalities.<n>In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.<n>We propose a Multimodal Balance Coefficient that enables quantitative measurement of the balance of learning.
arXiv Detail & Related papers (2024-07-29T23:18:55Z)
Multimodal Classification via Modal-Aware Interactive Enhancement [6.621745547882088]
We propose a novel multimodal learning method, called modal-aware interactive enhancement (MIE) Specifically, we first utilize an optimization strategy based on sharpness aware minimization (SAM) to smooth the learning objective during the forward phase. Then, with the help of the geometry property of SAM, we propose a gradient modification strategy to impose the influence between different modalities during the backward phase.
arXiv Detail & Related papers (2024-07-05T15:32:07Z)
Model Composition for Multimodal Large Language Models [71.5729418523411]
We propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model. Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters.
arXiv Detail & Related papers (2024-02-20T06:38:10Z)
A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization [7.097069899573992]
We study the Multi-Objective Bi-Level Optimization (MOBLO) problem. Existing gradient-based MOBLO algorithms need to compute the Hessian matrix. We propose an efficient first-order multi-gradient method for MOBLO, called FORUM.
arXiv Detail & Related papers (2024-01-17T15:03:37Z)
Sample-Efficient Multi-Agent RL: An Optimization Perspective [103.35353196535544]
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation. We introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs. We show that our algorithm provides comparable sublinear regret to the existing works.
arXiv Detail & Related papers (2023-10-10T01:39:04Z)
PMR: Prototypical Modal Rebalance for Multimodal Learning [11.5547414386921]
We propose Prototypical Modality Rebalance (PMR) to perform stimulation on the particular slow-learning modality without interference from other modalities. Our method only relies on the representations of each modality and without restrictions from model structures and fusion methods.
arXiv Detail & Related papers (2022-11-14T03:36:05Z)
MAML is a Noisy Contrastive Learner [72.04430033118426]
Model-agnostic meta-learning (MAML) is one of the most popular and widely-adopted meta-learning algorithms nowadays. We provide a new perspective to the working mechanism of MAML and discover that: MAML is analogous to a meta-learner using a supervised contrastive objective function. We propose a simple but effective technique, zeroing trick, to alleviate such interference.
arXiv Detail & Related papers (2021-06-29T12:52:26Z)
Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning [63.64636047748605]
We develop a new theoretical framework to provide convergence guarantee for the general multi-step MAML algorithm. In particular, our results suggest that an inner-stage step needs to be chosen inversely proportional to $N$ of inner-stage steps in order for $N$ MAML to have guaranteed convergence.
arXiv Detail & Related papers (2020-02-18T19:17:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.