Related papers: When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs

When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs

URL: http://arxiv.org/abs/2511.02243v1
Date: Tue, 04 Nov 2025 04:11:31 GMT
Title: When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs
Authors: Zhuoran Zhang, Tengyue Wang, Xilin Gong, Yang Shi, Haotian Wang, Di Wang, Lijie Hu,
Abstract summary: Multimodal large language models (MLLMs) must resolve conflicts when different modalities provide contradictory information.<n>We introduce a new framework that decomposes modality following into two fundamental factors: relative reasoning uncertainty and inherent modality preference.
Score: 15.617378124319472
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal large language models (MLLMs) must resolve conflicts when different modalities provide contradictory information, a process we term modality following. Prior work measured this behavior only with coarse dataset-level statistics, overlooking the influence of model's confidence in unimodal reasoning. In this paper, we introduce a new framework that decomposes modality following into two fundamental factors: relative reasoning uncertainty (the case-specific confidence gap between unimodal predictions) and inherent modality preference( a model's stable bias when uncertainties are balanced). To validate this framework, we construct a controllable dataset that systematically varies the reasoning difficulty of visual and textual inputs. Using entropy as a fine-grained uncertainty metric, we uncover a universal law: the probability of following a modality decreases monotonically as its relative uncertainty increases. At the relative difficulty level where the model tends to follow both modalities with comparable probability what we call the balance point, a practical indicator of the model's inherent preference. Unlike traditional macro-level ratios, this measure offers a more principled and less confounded way to characterize modality bias, disentangling it from unimodal capabilities and dataset artifacts. Further, by probing layer-wise predictions, we reveal the internal mechanism of oscillation: in ambiguous regions near the balance point, models vacillate between modalities across layers, explaining externally observed indecision. Together, these findings establish relative uncertainty and inherent preference as the two governing principles of modality following, offering both a quantitative framework and mechanistic insight into how MLLMs resolve conflicting information.

Related papers

ProbFM: Probabilistic Time Series Foundation Model with Uncertainty Decomposition [0.12489632787815884]
Time Series Foundation Models (TSFMs) have emerged as a promising approach for zero-shot financial forecasting.<n>Current approaches either rely on restrictive distributional assumptions, conflate different sources of uncertainty, or lack principled calibration mechanisms.<n>We present a novel transformer-based probabilistic framework, ProbFM, that leverages Deep Evidential Regression (DER) to provide principled uncertainty quantification.
arXiv Detail & Related papers (2026-01-15T17:02:06Z)
ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning [2.1461777157838724]
We introduce ReasonBENCH, the first benchmark designed to quantify the underlying instability in large language models (LLMs) reasoning.<n>Across tasks from different domains, we find that the vast majority of reasoning strategies and models exhibit high instability.<n>We further analyze the impact of prompts, model families, and scale on the trade-off between solve rate and stability.
arXiv Detail & Related papers (2025-12-08T18:26:58Z)
Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z)
Quantification of model error for inverse problems in the Weak Neural Variational Inference framework [0.0]
We present a novel extension to the Weak Neural Variational Inference (WNVI) framework for probabilistic property estimation.<n>Our framework explicitly quantifies model errors in PDE-based inverse problems.<n>Our findings suggest that the proposed framework enhances the accuracy and reliability of material property estimation.
arXiv Detail & Related papers (2025-02-11T09:52:06Z)
Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion [3.66486428341988]
Multimodal AI models are increasingly used in fields like healthcare, finance, and autonomous driving.<n>Uncertainty arising from noise, insufficient evidence, or conflicts between modalities is crucial for reliable decision-making.<n>We propose a novel multimodal learning method with order-invariant evidence fusion and introduce a conflict-based discounting mechanism.
arXiv Detail & Related papers (2024-12-23T22:37:18Z)
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling. Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z)
Does Misclassifying Non-confounding Covariates as Confounders Affect the Causal Inference within the Potential Outcomes Framework? [4.074237603319893]
The Potential Outcome Framework (POF) plays a prominent role in the field of causal inference. We present a unified graphical framework for the CIMs-POF, which greatly enhances the comprehension of these models' underlying principles.
arXiv Detail & Related papers (2023-08-22T13:18:13Z)
Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation [50.920911532133154]
The intrinsic ill-posedness and ordinal-sensitive nature of monocular depth estimation (MDE) models pose major challenges to the estimation of uncertainty degree. We propose to model the uncertainty of MDE models from the perspective of the inherent probability distributions. By simply introducing additional training regularization terms, our model, with surprisingly simple formations and without requiring extra modules or multiple inferences, can provide uncertainty estimations with state-of-the-art reliability.
arXiv Detail & Related papers (2023-07-19T12:11:15Z)
Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result. Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z)
Disentangling Observed Causal Effects from Latent Confounders using Method of Moments [67.27068846108047]
We provide guarantees on identifiability and learnability under mild assumptions. We develop efficient algorithms based on coupled tensor decomposition with linear constraints to obtain scalable and guaranteed solutions.
arXiv Detail & Related papers (2021-01-17T07:48:45Z)
Modal Uncertainty Estimation via Discrete Latent Representation [4.246061945756033]
We introduce a deep learning framework that learns the one-to-many mappings between the inputs and outputs, together with faithful uncertainty measures. Our framework demonstrates significantly more accurate uncertainty estimation than the current state-of-the-art methods.
arXiv Detail & Related papers (2020-07-25T05:29:34Z)
Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets. Part of the challenge of learning robust models lies in the influence of unobserved confounders. We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.