Fugu-MT 論文翻訳(概要): Improving Metacognition and Uncertainty Communication in Language Models

論文の概要: Improving Metacognition and Uncertainty Communication in Language Models

arxiv url: http://arxiv.org/abs/2510.05126v1
Date: Tue, 30 Sep 2025 19:50:02 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-08 17:57:07.828394
Title: Improving Metacognition and Uncertainty Communication in Language Models
Title（参考訳）: 言語モデルにおけるメタ認知と不確実性コミュニケーションの改善
Authors: Mark Steyvers, Catarina Belem, Padhraic Smyth,
Abstract要約: 大規模言語モデル(LLM)は、意思決定の文脈でますます使われている。 LLMの明示的な言語的信頼は、典型的には誤解され、正解と誤解の区別が不十分である。教師付き微調整が不確かさを伝達するモデルの能力を向上させるかどうかを検討する。
参考スコア（独自算出の注目度）: 13.389881635116472
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are increasingly used in decision-making contexts, but when they present answers without signaling low confidence, users may unknowingly act on erroneous outputs. While prior work shows that LLMs maintain internal uncertainty signals, their explicit verbalized confidence is typically miscalibrated and poorly discriminates between correct and incorrect answers. Across two types of LLMs, we investigate whether supervised finetuning can improve models' ability to communicate uncertainty and whether such improvements generalize across tasks and domains. We finetune the LLMs on datasets spanning general knowledge, mathematics, and open-ended trivia, and evaluate two metacognitive tasks: (1) single-question confidence estimation, where the model assigns a numeric certainty to its answer, and (2) pairwise confidence comparison, where the model selects which of two answers it is more likely to have correct. We assess generalization to unseen domains, including medical and legal reasoning. Results show that finetuning improves calibration (alignment between stated confidence and accuracy) and discrimination (higher confidence for correct vs. incorrect responses) within and across domains, while leaving accuracy unchanged. However, improvements are task-specific: training on single-question calibration does not transfer to pairwise comparison, and vice versa. In contrast, multitask finetuning on both forms of metacognition yields broader gains, producing lower calibration error and stronger discrimination in out-of-domain evaluations. These results show that while uncertainty communication in LLMs is trainable and generalizable, different metacognitive skills do not naturally reinforce one another and must be developed together through multitask training.
Abstract（参考訳）: 大規模言語モデル(LLM)は、意思決定の文脈で使われることが多いが、低信頼のシグナルを示さずに回答を提示すると、ユーザは誤出力に無意識に作用する可能性がある。以前の研究は、LLMが内部の不確実性信号を維持することを示しているが、その明示的な言語化された自信は、典型的には誤解され、正しい答えと間違った答えとを区別しにくい。 2種類のLCMにおいて,教師付き微調整が不確実性を伝達するモデルの能力を向上させるか,タスクやドメインをまたいで一般化するかを検討する。一般知識,数学,オープントリビアにまたがるデータセットに基づいてLLMを微調整し,(1)単一質問の信頼度推定,(2)数値の確実性をその答えに割り当てる,(2)モデルがどの回答を正しいかを選択する,という2つのメタ認知的タスクを評価する。我々は、医学的、法的推論を含む、目に見えない領域への一般化を評価する。その結果、微調整は、精度を保ちながら、領域内および領域内におけるキャリブレーション(記述された信頼度と精度の調整)と識別(正誤応答に対する高い信頼度)を改善することが示された。しかし、改善はタスク固有のものであり、シングルクエクションキャリブレーションのトレーニングはペア比較に移行しない。対照的に、メタ認知の両形態におけるマルチタスクの微調整は、より広い利得をもたらし、低い校正誤差とドメイン外評価におけるより強い差別を生み出す。これらの結果から,LSMにおける不確実性コミュニケーションは訓練可能であり,一般化可能であるが,メタ認知能力の相違は自然に強化されるものではなく,マルチタスクトレーニングを通じて共同で開発されなければならないことが示唆された。

論文の概要: Improving Metacognition and Uncertainty Communication in Language Models

関連論文リスト