See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias
- URL: http://arxiv.org/abs/2503.13834v1
- Date: Tue, 18 Mar 2025 02:17:41 GMT
- Title: See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias
- Authors: JuneHyoung Kwon, MiHyeon Kim, Eunju Lee, Juhwan Choi, YoungBin Kim,
- Abstract summary: Vision-language (VL) models often rely on a specific modality for predictions, leading to "dominant modality bias"<n>We propose a novel framework, BalGrad, to mitigate dominant modality bias.<n> Experiments on UPMC Food-101, Hateful Memes, and MM-IMDb datasets confirm that BalGrad effectively alleviates over-reliance on specific modalities when making predictions.
- Score: 7.769664248755815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-language (VL) models have demonstrated strong performance across various tasks. However, these models often rely on a specific modality for predictions, leading to "dominant modality bias.'' This bias significantly hurts performance, especially when one modality is impaired. In this study, we analyze model behavior under dominant modality bias and theoretically show that unaligned gradients or differences in gradient magnitudes prevent balanced convergence of the loss. Based on these findings, we propose a novel framework, BalGrad to mitigate dominant modality bias. Our approach includes inter-modality gradient reweighting, adjusting the gradient of KL divergence based on each modality's contribution, and inter-task gradient projection to align task directions in a non-conflicting manner. Experiments on UPMC Food-101, Hateful Memes, and MM-IMDb datasets confirm that BalGrad effectively alleviates over-reliance on specific modalities when making predictions.
Related papers
- Mitigating Modality Bias in Multi-modal Entity Alignment from a Causal Perspective [15.239882327601016]
We propose a counterfactual debiasing framework for MMEA, termed CDMEA, which investigates visual modality bias from a causal perspective.
Our approach aims to leverage both visual and graph modalities to enhance MMEA while suppressing the direct causal effect of the visual modality on model predictions.
arXiv Detail & Related papers (2025-04-28T03:48:23Z) - Gradient Extrapolation for Debiased Representation Learning [7.183424522250937]
Gradient Extrapolation for Debiased Representation Learning (GERNE) is designed to learn debiased representations in both known and unknown attribute training cases.<n>GERNE can serve as a general framework for debiasing with methods, such as ERM, reweighting, and resampling, being shown as special cases.<n>The proposed approach is validated on five vision and one NLP benchmarks, demonstrating competitive and often superior performance compared to state-of-the-art baseline methods.
arXiv Detail & Related papers (2025-03-17T14:48:57Z) - Injecting Imbalance Sensitivity for Multi-Task Learning [36.60453299563175]
Multi-task learning (MTL) has emerged as a promising approach for deploying deep learning models in real-life applications.<n>Recent studies have proposed optimization-based learning paradigms to establish task-shared representations in MTL.<n>Our paper empirically argues that these studies primarily emphasize the conflict issue while neglecting the potentially more significant impact of imbalance/dominance in MTL.
arXiv Detail & Related papers (2025-03-11T03:11:54Z) - Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation [3.328448170090945]
Gradient Descent (SGD) with adaptive steps is widely used to train deep neural networks and generative models.<n>This paper provides a comprehensive analysis of the effect of bias on gradient functions.
arXiv Detail & Related papers (2024-02-05T10:17:36Z) - Ensemble Modeling for Multimodal Visual Action Recognition [50.38638300332429]
We propose an ensemble modeling approach for multimodal action recognition.
We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset.
arXiv Detail & Related papers (2023-08-10T08:43:20Z) - Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy.
We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples.
Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z) - Feature-Level Debiased Natural Language Understanding [86.8751772146264]
Existing natural language understanding (NLU) models often rely on dataset biases to achieve high performance on specific datasets.
We propose debiasing contrastive learning (DCT) to mitigate biased latent features and neglect the dynamic nature of bias.
DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
arXiv Detail & Related papers (2022-12-11T06:16:14Z) - Calibrating Segmentation Networks with Margin-based Label Smoothing [19.669173092632]
We provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses.
These losses could be viewed as approximations of a linear penalty imposing equality constraints on logit distances.
We propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances.
arXiv Detail & Related papers (2022-09-09T20:21:03Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Unleashing the Power of Contrastive Self-Supervised Visual Models via
Contrast-Regularized Fine-Tuning [94.35586521144117]
We investigate whether applying contrastive learning to fine-tuning would bring further benefits.
We propose Contrast-regularized tuning (Core-tuning), a novel approach for fine-tuning contrastive self-supervised visual models.
arXiv Detail & Related papers (2021-02-12T16:31:24Z) - Extreme Memorization via Scale of Initialization [72.78162454173803]
We construct an experimental setup in which changing the scale of initialization strongly impacts the implicit regularization induced by SGD.
We find that the extent and manner in which generalization ability is affected depends on the activation and loss function used.
In the case of the homogeneous ReLU activation, we show that this behavior can be attributed to the loss function.
arXiv Detail & Related papers (2020-08-31T04:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.