Fugu-MT 論文翻訳(概要): Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding

論文の概要: Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding

arxiv url: http://arxiv.org/abs/2603.13366v1
Date: Mon, 09 Mar 2026 12:47:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.127406
Title: Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding
Title（参考訳）: 不確実性を考える:潜在エントロピー認識復号によるMLRMの幻覚の緩和
Authors: Zhongxing Xu, Zhonghua Wang, Zhe Qian, Dachuan Shi, Feilong Tang, Ming Hu, Shiyan Su, Xiaocheng Zou, Wei Feng, Dwarikanath Mahapatra, Yifan Peng, Mingquan Lin, Zongyuan Ge,
Abstract要約: トークン確率分布から適切な文脈推論情報を直接抽出できることを論じる。本稿では,効率的なプラグアンドプレイデコーディング戦略であるLatent Entropy-Aware Decodingを提案する。このモデルは高エントロピー状態下での確率重み付き連続埋め込みを採用し、エントロピーが減少するにつれて離散トークン埋め込みに遷移する。
参考スコア（独自算出の注目度）: 38.5840117402958
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in multimodal large reasoning models (MLRMs) have significantly improved performance in visual question answering. However, we observe that transition words (e.g., because, however, and wait) are closely associated with hallucinations and tend to exhibit high-entropy states. We argue that adequate contextual reasoning information can be directly extracted from the token probability distribution. Inspired by superposed representation theory, we propose leveraging latent superposed reasoning to integrate multiple candidate semantics and maintain latent reasoning trajectories. The hypothesis is that reliance on discrete textual inputs may drive the model toward sequential explicit reasoning, underutilizing dense contextual cues during high-entropy reasoning stages. Therefore, we propose constructing rich semantic representations from the token probability distributions to enhance in-context reasoning. With this goal, we present Latent Entropy-Aware Decoding (LEAD), an efficient plug-and-play decoding strategy that leverages semantic context to achieve reliable reasoning. The heart of our method lies in entropy-aware reasoning mode switching. The model employs probability-weighted continuous embeddings under high-entropy states and transitions back to discrete token embeddings as entropy decreases. Moreover, we propose a prior-guided visual anchor injection strategy that encourages the model to focus on visual information. Extensive experiments show that LEAD effectively mitigates hallucinations across various MLRMs on multiple benchmarks.
Abstract（参考訳）: マルチモーダル大推論モデル(MLRM)の最近の進歩は,視覚的質問応答の性能を著しく向上させた。しかし, 移行語(例えば, 待機語)は幻覚と密接に関連しており, エントロピー状態が強い傾向にある。トークン確率分布から適切な文脈推論情報を直接抽出できることを論じる。代用代用代用代用代用代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代入代この仮説は、離散的なテキスト入力への依存が、高エントロピー推論段階における密集した文脈的手がかりを弱め、逐次的明示的推論に向けてモデルを駆動する可能性があるというものである。そこで本研究では,トークン確率分布からリッチな意味表現を構築し,文脈内推論を強化する。この目的により,意味的文脈を利用して信頼性の高い推論を実現する,効率的なプラグ・アンド・プレイ・デコーディング戦略であるLatent Entropy-Aware Decoding (LEAD)を提案する。提案手法の核心はエントロピー対応推論モードスイッチングにある。このモデルは高エントロピー状態下での確率重み付き連続埋め込みを採用し、エントロピーが減少するにつれて離散トークン埋め込みに遷移する。さらに,モデルが視覚情報に集中することを奨励する,事前誘導型視覚アンカー注入方式を提案する。大規模な実験により、LEADは複数のベンチマークで様々なMLRMの幻覚を効果的に緩和することが示された。

論文の概要: Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding

関連論文リスト