Fugu-MT 論文翻訳(概要): Enhancing Uncertainty Estimation in LLMs with Expectation of Aggregated Internal Belief

論文の概要: Enhancing Uncertainty Estimation in LLMs with Expectation of Aggregated Internal Belief

arxiv url: http://arxiv.org/abs/2509.01564v1
Date: Mon, 01 Sep 2025 15:50:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 15:17:03.757306
Title: Enhancing Uncertainty Estimation in LLMs with Expectation of Aggregated Internal Belief
Title（参考訳）: 集合的内的信念を期待するLDMの不確実性評価の促進
Authors: Zeguan Xiao, Diyang Dou, Boya Xiong, Yun Chen, Guanhua Chen,
Abstract要約: 大規模言語モデル(LLM)は、広範囲の自然言語タスクにおいて顕著な成功を収めてきたが、しばしば過剰な自信を示し、妥当で不正確な答えを生み出している。この過信は、信頼性の高い不確実性推定と安全なデプロイメントに重大な課題をもたらす。本研究では,LLMの内部隠蔽状態を利用した自己評価に基づくキャリブレーション手法を提案する。
参考スコア（独自算出の注目度）: 6.1929548590367505
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have achieved remarkable success across a wide range of natural language tasks, but often exhibit overconfidence and generate plausible yet incorrect answers. This overconfidence, especially in models undergone Reinforcement Learning from Human Feedback (RLHF), poses significant challenges for reliable uncertainty estimation and safe deployment. In this paper, we propose EAGLE (Expectation of AGgregated internaL bEief), a novel self-evaluation-based calibration method that leverages the internal hidden states of LLMs to derive more accurate confidence scores. Instead of relying on the model's final output, our approach extracts internal beliefs from multiple intermediate layers during self-evaluation. By aggregating these layer-wise beliefs and calculating the expectation over the resulting confidence score distribution, EAGLE produces a refined confidence score that more faithfully reflects the model's internal certainty. Extensive experiments on diverse datasets and LLMs demonstrate that EAGLE significantly improves calibration performance over existing baselines. We also provide an in-depth analysis of EAGLE, including a layer-wise examination of uncertainty patterns, a study of the impact of self-evaluation prompts, and an analysis of the effect of self-evaluation score range.
Abstract（参考訳）: 大規模言語モデル(LLM)は、広範囲の自然言語タスクにおいて顕著な成功を収めてきたが、しばしば過剰な自信を示し、妥当で不正確な答えを生み出している。この過信、特にRLHF(Reinforcement Learning from Human Feedback)のモデルでは、信頼性の高い不確実性推定と安全なデプロイメントに重大な課題が生じる。本稿では,LSMの内部隠れ状態を利用した自己評価に基づくキャリブレーション手法であるAGLE(Expectation of AGgregated InternaL bEief)を提案する。モデルの最終出力に頼る代わりに、本手法は自己評価中に複数の中間層から内部信念を抽出する。これらの階層的な信念を集約し、結果の信頼度分布に対する期待を計算することで、ERGLEはモデルの内部的確実性をより忠実に反映した洗練された信頼度スコアを生成する。多様なデータセットとLLMに関する大規模な実験により、EAGLEは既存のベースラインよりもキャリブレーション性能を大幅に改善することが示された。また,不確実性パターンの階層的検討,自己評価プロンプトの影響調査,自己評価スコア範囲の影響分析など,ERGLEの詳細な分析を行った。

論文の概要: Enhancing Uncertainty Estimation in LLMs with Expectation of Aggregated Internal Belief

関連論文リスト