Fugu-MT 論文翻訳(概要): Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

論文の概要: Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

arxiv url: http://arxiv.org/abs/2508.12040v1
Date: Sat, 16 Aug 2025 13:29:35 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-19 14:49:10.541837
Title: Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation
Title（参考訳）: 生成過程の考え方:LLM生成時の細粒度信頼度推定
Authors: Jinyi Han, Tingyun Li, Shisong Chen, Jie Shi, Xinyi Wang, Guanglei Yue, Jiaqing Liang, Xin Lin, Liqian Wen, Zulong Chen, Yanghua Xiao,
Abstract要約: 大規模言語モデル(LLM)は自信過剰を示し、信頼度の高いスコアを誤った予測に割り当てる。本研究では,テキスト生成中に高精度できめ細かな信頼スコアを提供する信頼度推定手法であるFineCEを紹介する。論文で使用されたコードとすべてのベースラインはGitHubで公開されている。
参考スコア（独自算出の注目度）: 63.49409574310576
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While large language models (LLMs) have demonstrated remarkable performance across diverse tasks, they fundamentally lack self-awareness and frequently exhibit overconfidence, assigning high confidence scores to incorrect predictions. Accurate confidence estimation is therefore critical for enhancing the trustworthiness and reliability of LLM-generated outputs. However, existing approaches suffer from coarse-grained scoring mechanisms that fail to provide fine-grained, continuous confidence estimates throughout the generation process. To address these limitations, we introduce FineCE, a novel confidence estimation method that delivers accurate, fine-grained confidence scores during text generation. Specifically, we first develop a comprehensive pipeline for constructing training data that effectively captures the underlying probabilistic distribution of LLM responses, and then train a model to predict confidence scores for arbitrary text sequences in a supervised manner. Furthermore, we propose a Backward Confidence Integration (BCI) strategy that leverages information from the subsequent text to enhance confidence estimation for the current sequence during inference. We also introduce three strategies for identifying optimal positions to perform confidence estimation within the generation process. Extensive experiments on multiple benchmark datasets demonstrate that FineCE consistently outperforms existing classical confidence estimation methods. Our code and all baselines used in the paper are available on GitHub.
Abstract（参考訳）: 大きな言語モデル(LLM)は様々なタスクにまたがって顕著なパフォーマンスを示してきたが、基本的には自己認識が欠如しており、しばしば過剰な自信を示し、信頼度の高いスコアを誤った予測に割り当てている。したがって、LCM出力の信頼性と信頼性を高めるためには、正確な信頼度推定が重要である。しかし、既存のアプローチは、生成プロセス全体を通して、きめ細かい連続的な信頼度を推定できない粗粒度スコアリングメカニズムに悩まされている。このような制約に対処するために,テキスト生成中に精度の高いきめ細かな信頼スコアを提供する新しい信頼度推定手法であるFineCEを導入する。具体的には、まずLLM応答の確率分布を効果的に把握する訓練データを構築するための総合的なパイプラインを開発し、次に教師付き方法で任意のテキストシーケンスに対する信頼度を予測するためのモデルを訓練する。さらに、その後のテキストからの情報を活用して、推定中の現在のシーケンスに対する信頼度を向上するバックワード信頼統合(BCI)戦略を提案する。また、生成プロセス内で信頼度を推定する最適な位置を特定するための3つの戦略も導入する。複数のベンチマークデータセットに対する大規模な実験は、FinCEが既存の古典的信頼度推定法を一貫して上回っていることを示している。論文で使用されたコードとすべてのベースラインはGitHubで公開されている。

論文の概要: Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

関連論文リスト