Related papers: Multimodal Negative Learning

Multimodal Negative Learning

URL: http://arxiv.org/abs/2510.20877v1
Date: Thu, 23 Oct 2025 11:47:11 GMT
Title: Multimodal Negative Learning
Authors: Baoquan Gong, Xiyuan Gao, Pengfei Zhu, Qinghua Hu, Bing Cao,
Abstract summary: We propose a new learning paradigm: "Learning Not to be" (Negative Learning)<n>Instead of enhancing weak modalities' target-class predictions, the dominant modalities dynamically guide the weak modality to suppress non-target classes.<n>This stabilizes the decision space and preserves modality-specific information.
Score: 55.67017420486548
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal learning systems often encounter challenges related to modality imbalance, where a dominant modality may overshadow others, thereby hindering the learning of weak modalities. Conventional approaches often force weak modalities to align with dominant ones in "Learning to be (the same)" (Positive Learning), which risks suppressing the unique information inherent in the weak modalities. To address this challenge, we offer a new learning paradigm: "Learning Not to be" (Negative Learning). Instead of enhancing weak modalities' target-class predictions, the dominant modalities dynamically guide the weak modality to suppress non-target classes. This stabilizes the decision space and preserves modality-specific information, allowing weak modalities to preserve unique information without being over-aligned. We proceed to reveal multimodal learning from a robustness perspective and theoretically derive the Multimodal Negative Learning (MNL) framework, which introduces a dynamic guidance mechanism tailored for negative learning. Our method provably tightens the robustness lower bound of multimodal learning by increasing the Unimodal Confidence Margin (UCoM) and reduces the empirical error of weak modalities, particularly under noisy and imbalanced scenarios. Extensive experiments across multiple benchmarks demonstrate the effectiveness and generalizability of our approach against competing methods. The code will be available at https://github.com/BaoquanGong/Multimodal-Negative-Learning.git.

Related papers

From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation [59.27094165576015]
We propose a novel learning paradigm (UniMod) that transitions from sparse decision-making to dense reasoning traces.<n>By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, we reformulate monolithic decision tasks into a multi-dimensional boundary learning process.<n>We introduce specialized optimization strategies to decouple task-specific parameters and rebalance training dynamics, effectively resolving interference between diverse objectives in multi-task learning.
arXiv Detail & Related papers (2026-01-28T09:29:40Z)
Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion [6.749782429802639]
Multimodal learning is significantly constrained by modality imbalance.<n>We propose a novel approach to balance the classification ability of weak and strong modalities by incorporating the principle of boosting.
arXiv Detail & Related papers (2025-02-27T14:12:20Z)
PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning [42.00851701431368]
Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal data, such as audio-visual and image-text pairs.<n>A critical challenge remains: the issue of missing modalities during incremental learning phases.<n>We propose PAL, a novel exemplar-free framework tailored to MMCIL under missing-modality scenarios.
arXiv Detail & Related papers (2025-01-16T08:04:04Z)
Asymmetric Reinforcing against Multi-modal Representation Bias [59.685072206359855]
We propose an Asymmetric Reinforcing method against Multimodal representation bias (ARM)<n>Our ARM dynamically reinforces the weak modalities while maintaining the ability to represent dominant modalities through conditional mutual information.<n>We have significantly improved the performance of multimodal learning, making notable progress in mitigating imbalanced multimodal learning.
arXiv Detail & Related papers (2025-01-02T13:00:06Z)
Diagnosing and Re-learning for Balanced Multimodal Learning [8.779005254634857]
We propose the Diagnosing & Re-learning method to overcome the imbalanced multimodal learning problem. The learning state of each modality is estimated based on the separability of its uni-modal representation space. In this way, the over-emphasizing of scarcely informative modalities is avoided.
arXiv Detail & Related papers (2024-07-12T22:12:03Z)
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance [10.580712937465032]
We identify the previously ignored gradient conflict between multimodal and unimodal learning objectives. We propose MMPareto algorithm, which could ensure a final gradient with direction common to all learning objectives. Our method is also expected to facilitate multi-task cases with a clear discrepancy in task difficulty.
arXiv Detail & Related papers (2024-05-28T01:19:13Z)
Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning. MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process. It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities. Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z)
Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. We pose the problem of unseen modality interaction and introduce a first solution. It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z)
Calibrating Multimodal Learning [94.65232214643436]
We propose a novel regularization technique, i.e., Calibrating Multimodal Learning (CML) regularization, to calibrate the predictive confidence of previous methods. This technique could be flexibly equipped by existing models and improve the performance in terms of confidence calibration, classification accuracy, and model robustness.
arXiv Detail & Related papers (2023-06-02T04:29:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.