Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
- URL: http://arxiv.org/abs/2506.07575v1
- Date: Mon, 09 Jun 2025 09:20:20 GMT
- Title: Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
- Authors: Ruiyang Zhang, Hu Zhang, Hao Fei, Zhedong Zheng,
- Abstract summary: Uncertainty-o is a model-agnostic framework designed to reveal uncertainty in LMMs regardless of their modalities, architectures, or capabilities.<n>Experiments across 18 benchmarks spanning various modalities and 10 LMMs demonstrate the effectiveness of Uncertainty-o in reliably estimating LMM uncertainty.
- Score: 30.709848959820015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Multimodal Models (LMMs), harnessing the complementarity among diverse modalities, are often considered more robust than pure Language Large Models (LLMs); yet do LMMs know what they do not know? There are three key open questions remaining: (1) how to evaluate the uncertainty of diverse LMMs in a unified manner, (2) how to prompt LMMs to show its uncertainty, and (3) how to quantify uncertainty for downstream tasks. In an attempt to address these challenges, we introduce Uncertainty-o: (1) a model-agnostic framework designed to reveal uncertainty in LMMs regardless of their modalities, architectures, or capabilities, (2) an empirical exploration of multimodal prompt perturbations to uncover LMM uncertainty, offering insights and findings, and (3) derive the formulation of multimodal semantic uncertainty, which enables quantifying uncertainty from multimodal responses. Experiments across 18 benchmarks spanning various modalities and 10 LMMs (both open- and closed-source) demonstrate the effectiveness of Uncertainty-o in reliably estimating LMM uncertainty, thereby enhancing downstream tasks such as hallucination detection, hallucination mitigation, and uncertainty-aware Chain-of-Thought reasoning.
Related papers
- Do not Abstain! Identify and Solve the Uncertainty [25.744791822890036]
We introduce bftextConfuseBench, a benchmark mainly focus on three types of uncertainty: document scarcity, limited capability, and query ambiguity.<n>Experiments reveal that current LLMs struggle to accurately identify the root cause of uncertainty and solve it.<n>We first generate context-aware inquiries that highlight the confusing aspect of the original query.<n>Then we judge the source of uncertainty based on the uniqueness of the inquiry's answer.
arXiv Detail & Related papers (2025-06-01T02:15:17Z) - Token-Level Uncertainty Estimation for Large Language Model Reasoning [24.56760223952017]
Large Language Models (LLMs) have demonstrated impressive capabilities, but their output quality remains inconsistent across various application scenarios.<n>We propose a token-level uncertainty estimation framework to enable LLMs to self-assess and self-improve their generation quality in mathematical reasoning.
arXiv Detail & Related papers (2025-05-16T22:47:32Z) - MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency [63.23935582919081]
Chain-of-Thought (CoT) has significantly enhanced the reasoning capabilities of Large Language Models (LLMs)<n>We introduce MME-CoT, a specialized benchmark evaluating the CoT reasoning performance of LMMs.<n>We conduct an in-depth analysis of state-of-the-art LMMs, uncovering several key insights.
arXiv Detail & Related papers (2025-02-13T18:59:46Z) - Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion [3.66486428341988]
Multimodal AI models are increasingly used in fields like healthcare, finance, and autonomous driving.<n>Uncertainty arising from noise, insufficient evidence, or conflicts between modalities is crucial for reliable decision-making.<n>We propose a novel multimodal learning method with order-invariant evidence fusion and introduce a conflict-based discounting mechanism.
arXiv Detail & Related papers (2024-12-23T22:37:18Z) - SAUP: Situation Awareness Uncertainty Propagation on LLM Agent [52.444674213316574]
Large language models (LLMs) integrated into multistep agent systems enable complex decision-making processes across various applications.<n>Existing uncertainty estimation methods primarily focus on final-step outputs, which fail to account for cumulative uncertainty over the multistep decision-making process and the dynamic interactions between agents and their environments.<n>We propose SAUP, a novel framework that propagates uncertainty through each step of an LLM-based agent's reasoning process.
arXiv Detail & Related papers (2024-12-02T01:31:13Z) - The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio [118.75449542080746]
This paper presents the first systematic investigation of hallucinations in large multimodal models (LMMs)
Our study reveals two key contributors to hallucinations: overreliance on unimodal priors and spurious inter-modality correlations.
Our findings highlight key vulnerabilities, including imbalances in modality integration and biases from training data, underscoring the need for balanced cross-modal learning.
arXiv Detail & Related papers (2024-10-16T17:59:02Z) - CLUE: Concept-Level Uncertainty Estimation for Large Language Models [49.92690111618016]
We propose a novel framework for Concept-Level Uncertainty Estimation for Large Language Models (LLMs)
We leverage LLMs to convert output sequences into concept-level representations, breaking down sequences into individual concepts and measuring the uncertainty of each concept separately.
We conduct experiments to demonstrate that CLUE can provide more interpretable uncertainty estimation results compared with sentence-level uncertainty.
arXiv Detail & Related papers (2024-09-04T18:27:12Z) - MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty [10.154013836043816]
We investigate previous uncertainty quantification methods under the presence of data uncertainty.<n>Our findings show that previous methods relatively struggle compared to single-answer settings.<n>We observe that entropy- and consistency-based methods effectively estimate model uncertainty, even in the presence of data uncertainty.
arXiv Detail & Related papers (2024-08-13T11:17:31Z) - Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations [63.330182403615886]
A major barrier towards the practical deployment of large language models (LLMs) is their lack of reliability.
Three situations where this is particularly apparent are correctness, hallucinations when given unanswerable questions, and safety.
In all three cases, models should ideally abstain from responding, much like humans, whose ability to understand uncertainty makes us refrain from answering questions we don't know.
arXiv Detail & Related papers (2024-04-16T23:56:38Z) - Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models [84.78457918843165]
Unsolvable Problem Detection (UPD) is a novel task to evaluate the robust understanding capability of Large Multimodal Models (LMMs)<n>UPD assesses the LMM's ability to withhold answers when encountering unsolvable problems of multiple-choice question answering.<n>This paper introduces the MM-UPD Bench, a benchmark for assessing performance across various ability dimensions.
arXiv Detail & Related papers (2024-03-29T17:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.