Fugu-MT 論文翻訳(概要): Generalized Correctness Models: Learning Calibrated and Model-Agnostic Correctness Predictors from Historical Patterns

論文の概要: Generalized Correctness Models: Learning Calibrated and Model-Agnostic Correctness Predictors from Historical Patterns

arxiv url: http://arxiv.org/abs/2509.24988v1
Date: Mon, 29 Sep 2025 16:19:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:20.121477
Title: Generalized Correctness Models: Learning Calibrated and Model-Agnostic Correctness Predictors from Historical Patterns
Title（参考訳）: 一般化正当性モデル:歴史パターンからの校正とモデルに依存しない正当性予測の学習
Authors: Hanqi Xiao, Vaidehi Patil, Hyunji Lee, Elias Stengel-Eskin, Mohit Bansal,
Abstract要約: 本稿では,正確で校正された信頼度を推定する一般化精度モデル(GCM)を提案する。まず,多くのLCMの正当性データに基づいてGCMをトレーニングできることを示す。次に,CMをレンズとして,補正予測能力の源泉とその一般化について検討する。
参考スコア（独自算出の注目度）: 67.24756301536617
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating accurate and calibrated confidence estimates is critical for deploying LLMs in high-stakes or user-facing applications, and remains an open challenge. Prior research has often framed confidence as a problem of eliciting a model's "self-knowledge", i.e., the ability of an LLM to judge whether its own answers are correct; this approach implicitly assumes that there is some privileged information about the answer's correctness that is accessible to the model itself. However, our experiments reveal that an LLM attempting to predict the correctness of its own outputs generally performs no better than an unrelated LLM. Moreover, we hypothesize that a key factor in building a "Correctness Model" (CM) is exposure to a target model's historical predictions. We propose multiple methods to inject this historical correctness information, creating a Generalized Correctness Model (GCM). We first show that GCMs can be trained on the correctness data from many LLMs and learn patterns for correctness prediction applicable across datasets and models. We then use CMs as a lens for studying the source of correctness prediction ability and its generalization, systematically controlling their training data and finding that answer phrasing is a strong predictor for correctness. We further explore alternative methods of injecting history without training an LLM, finding that including history as in-context examples can help improve correctness prediction, and post-hoc calibration can provide complementary reductions in calibration error. We evaluate GCMs based on Qwen3-8B across 5 model families and the MMLU and TriviaQA datasets, as well as on a downstream selective prediction task, finding that reliable LLM confidence estimation is a generalizable and model-agnostic skill learned by systematically encoding correctness history rather than a model-specific skill reliant on self-introspection.
Abstract（参考訳）: 高精度でキャリブレーションされた信頼度推定を生成することは、LLMをハイテイクなアプリケーションやユーザ向けアプリケーションにデプロイする上で非常に重要であり、依然としてオープンな課題である。従来の研究は、モデルの「自己知識」を導き出す問題、すなわち LLM が自身の答えが正しいかどうかを判断する能力、すなわちモデル自体にアクセスできる答えの正しさに関する特権的な情報が存在することを暗黙的に仮定していた。しかし, 実験の結果, LLMが出力の正しさを予測しようとすると, 一般的には無関係な LLM に匹敵する性能を示すことがわかった。さらに,「正当性モデル」(CM)を構築する上で重要な要素は,対象モデルの歴史的予測に曝露することである,という仮説を立てた。本稿では、この歴史的正当性情報を注入する複数の手法を提案し、一般化正当性モデル(GCM)を作成する。まず,多くのLCMの正当性データに基づいてGCMをトレーニングし,データセットやモデルに適用可能な正当性予測パターンを学習できることを示す。次に,CMを正当性予測能力の源泉とその一般化の研究用レンズとして使用し,そのトレーニングデータを体系的に制御し,正当性予測の強力な指標である解答句の表現法を見出す。 LLMをトレーニングせずに履歴を注入する方法についても検討し、文脈内例として履歴を含めることによって精度の予測が向上し、ポストホックキャリブレーションによりキャリブレーション誤差が相補的に減少することを示した。我々は、5つのモデルファミリーおよびMMLUおよびTriviaQAデータセットのQwen3-8Bに基づくGCMと、下流選択予測タスクに基づいて、信頼性の高いLCM信頼度推定は、自己検査に依存するモデル固有のスキルよりも、モデル固有のスキルよりも、体系的に正史を符号化して学習した一般化可能でモデルに依存しないスキルであることを確認した。

論文の概要: Generalized Correctness Models: Learning Calibrated and Model-Agnostic Correctness Predictors from Historical Patterns

関連論文リスト