Fugu-MT 論文翻訳(概要): Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?

論文の概要: Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?

arxiv url: http://arxiv.org/abs/2605.28778v1
Date: Wed, 27 May 2026 17:38:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:56.251843
Title: Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?
Title（参考訳）: LLMは言語的不確実性マーカを使って本質的な信頼を確実に反映できるか?
Authors: Gabrielle Kaili-May Liu, Arman Cohan,
Abstract要約: モデルが、安定かつ一般化可能な方法でマーカーと特定の信頼レベルを関連付けるために、独自の言語的信頼フレームワークを適用できるかどうかは不明である。我々は,与えられたタスク領域において,モデルが特定のてんかんマーカーと関連付ける内在的信頼度として,_marker internal confidence_(MIC)を定式化する。分析フレームワークを多種多様なモデルやタスクに適用すると、LLMはマーカーの意味のモデル中心の解釈の下でも、忠実に誤解される。
参考スコア（独自算出の注目度）: 50.17126900886782
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLMs' linguistically expressed confidence should faithfully reflect their intrinsic uncertainty. While recent work shows LLMs struggle to use epistemic markers (e.g., "it is likely...") in a human-aligned fashion, it remains unclear whether models can apply their own linguistic confidence framework to associate markers with specific confidence levels in a stable and generalizable way, and how contextual features impact this ability. We conduct the first systematic study of this question, formalizing _marker internal confidence_ (MIC) as the estimated intrinsic confidence a model associates with a specific epistemic marker in a given task domain. We present 7 metrics to evaluate the stability of MICs within and across distributions. Applying our analysis framework to diverse models and tasks, we find that LLMs remain faithfully miscalibrated even under model-centric interpretation of marker meanings, struggling to differentiate markers by internal confidence across distributions despite preserving a somewhat consistent ranking order across tasks. This supplies critical, complementary evidence to existing work toward a holistic understanding of faithful calibration in LLMs, emphasizing the need for more aligned and stable marker use to improve trustworthiness and reliability.
Abstract（参考訳）: LLMの言語的に表現された自信は、本質的な不確かさを忠実に反映すべきである。最近の研究は、LLMがヒトと協調してエピステママーカーを使用するのに苦労していることを示しているが、モデルが特定の信頼性レベルを安定して一般化可能な方法でマーカを関連付けるために、自身の言語的信頼フレームワークを適用できるかどうか、また、文脈的特徴がこの能力にどのように影響するかは定かではない。本研究では,特定のタスク領域において,モデルが特定のてんかんマーカーと関連付ける内因性信頼度として,_marker internal confidence_(MIC)を定式化する。分布内および分布内におけるMICの安定性を評価するための指標を7つ提示する。分析フレームワークを多種多様なモデルやタスクに適用すると、LCMはマーカーの意味のモデル中心の解釈の下でも、タスク間の幾分一貫したランク付け順序を維持しつつも、内部的な信頼性によってマーカーを区別するのに苦労していることが分かる。このことは、LCMの忠実な校正の全体的理解に向けた既存の研究に批判的で補完的な証拠を提供し、信頼性と信頼性を改善するためにより整合性があり安定したマーカーの使用の必要性を強調している。

論文の概要: Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?

関連論文リスト