Fugu-MT 論文翻訳(概要): Calibrating Verbalized Confidence with Self-Generated Distractors

論文の概要: Calibrating Verbalized Confidence with Self-Generated Distractors

arxiv url: http://arxiv.org/abs/2509.25532v1
Date: Mon, 29 Sep 2025 21:41:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 17:09:04.337496
Title: Calibrating Verbalized Confidence with Self-Generated Distractors
Title（参考訳）: 自己生成型ディトラクタによる垂直化信頼の校正
Authors: Victor Wang, Elias Stengel-Eskin,
Abstract要約: DINCO(Distractor-Normalized Coherence)を紹介する。 DINCOは、LLMの予測可能性バイアスを推定し、いくつかの自己生成障害に対してモデルに独立して信頼性を持たせることによって説明している。我々は、自己整合性の一般的なアプローチを、サンプル世代間でのコヒーレンスを活用すること、および非互換なクレーム上での検証におけるコヒーレンスを活用することとして、言語化された信頼を正規化したものである。
参考スコア（独自算出の注目度）: 24.56911906044891
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Calibrated confidence estimates are necessary for large language model (LLM) outputs to be trusted by human users. While LLMs can express their confidence in human-interpretable ways, verbalized LLM-generated confidence scores have empirically been found to be miscalibrated, reporting high confidence on instances with low accuracy and thereby harming trust and safety. We hypothesize that this overconfidence often stems from a given LLM's heightened suggestibility when faced with claims that it encodes little information about; we empirically validate this hypothesis, finding more suggestibility on lower-accuracy claims. Building on this finding, we introduce Distractor-Normalized Coherence (DINCO), which estimates and accounts for an LLM's suggestibility bias by having the model verbalize its confidence independently across several self-generated distractors (i.e. alternative claims), and normalizes by the total verbalized confidence. To further improve calibration, we leverage generator-validator disagreement, augmenting normalized validator confidence with a consistency-based estimate of generator confidence. Here, we frame the popular approach of self-consistency as leveraging coherence across sampled generations, and normalized verbalized confidence as leveraging coherence across validations on incompatible claims, allowing us to integrate these complementary dimensions of coherence into DINCO. Moreover, our analysis shows that DINCO provides less saturated -- and therefore more usable -- confidence estimates, and that further sampling alone cannot close the gap between DINCO and baselines, with DINCO at 10 inference calls outperforming self-consistency at 100.
Abstract（参考訳）: 大規模言語モデル (LLM) の出力は, 人間の信頼を得るためには, キャリブレーションされた信頼度推定が必要である。 LLMは人間の解釈可能な方法での信頼を表現できるが、言語化されたLLM生成の信頼スコアは実証的に誤解され、精度の低いインスタンスに高い信頼を報告し、信頼と安全を損なう。我々は、この過信は、LLMがほとんど情報をエンコードしていないという主張に直面した場合に、与えられたLLMの示唆可能性を高めることに起因していると仮定し、この仮説を実証的に検証し、より低い精度の主張に対する示唆可能性を見出す。そこで本研究では, 自己生成型分散器(代替クレーム)に独立して信頼度を定式化し, 全言語的信頼度で正規化することにより, LLMの予測可能性バイアスを推定し, 評価するDistractor-Normalized Coherence(DINCO)を提案する。さらにキャリブレーションを改善するために, ジェネレータ検証の不一致を利用して正規化検証器の信頼性を向上し, 整合性に基づくジェネレータの信頼度の推定を行う。ここでは、自己整合性の一般的なアプローチを、サンプル世代間でのコヒーレンスを活用すること、および非互換なクレーム上での検証におけるコヒーレンスを活用することによる言語的信頼を正規化することにより、これらの相補的なコヒーレンスをDINCOに統合することが可能になる。さらに,DINCOは,DINCOとベースラインのギャップを埋めることが不可能であり,DINCOが10の推論で100の自己整合性を上回ることが示唆された。

論文の概要: Calibrating Verbalized Confidence with Self-Generated Distractors

関連論文リスト