Fugu-MT 論文翻訳(概要): LLMs Know When They Know, but Do Not Act on It: A Metacognitive Harness for Test-time Scaling

論文の概要: LLMs Know When They Know, but Do Not Act on It: A Metacognitive Harness for Test-time Scaling

arxiv url: http://arxiv.org/abs/2605.14186v1
Date: Wed, 13 May 2026 23:09:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.53327
Title: LLMs Know When They Know, but Do Not Act on It: A Metacognitive Harness for Test-time Scaling
Title（参考訳）: LLMはいつ知っているが、それを実行しない:テスト時間スケーリングのためのメタ認知的ハーネス
Authors: Qi Cao, Yufan Wang, Peijia Qin, Shuhao Zhang, Pengtao Xie,
Abstract要約: 我々は,大規模言語モデル (LLM) が有効なテスト時間制御に変換できる潜在メタ認知能力を持っているかどうかを問う。認知心理学からネルソン=ナレンズ理論に触発された我々は、モニタリングと推論を分離するメタ認知的ハーネスを提案する。
参考スコア（独自算出の注目度）: 26.999207995495354
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) often expose useful signals of self-monitoring: before solving a problem, they can estimate whether they are likely to succeed, and after solving it, they can judge whether their answer is likely to be correct. However, these signals are typically measured or elicited in isolation, rather than used to control inference. In this work, we ask whether LLMs possess latent metacognitive ability that can be turned into effective test-time control. Inspired by the Nelson--Narens theory from cognitive psychology, we propose a metacognitive harness that separates monitoring from reasoning. For each problem, the model first reports a pre-solve feeling-of-knowing (FOK) signal; after each solve attempt, it reports a post-solve judgment-of-learning (JOL) signal. Rather than treating these signals as passive confidence estimates, the harness turns them into an explicit control interface for reasoning: it decides when to trust the current solution, when to retry with compact metacognitive feedback, and when to pass multiple attempts to a final aggregator. Across text, code, and multimodal reasoning benchmarks, our harness substantially improves a fixed Claude Sonnet-4.6 base model without parameter updates or benchmark-specific fine-tuning. On the evaluated public benchmark snapshots, it raises pooled accuracy from 48.3 to 56.9 and exceeds the strongest listed leaderboard entries on the three primary evaluation settings: HLE-Verified, LiveCodeBench v6, and R-Bench-V. These results suggest that strong LLMs may already possess useful metacognitive ability, but require an explicit control harness to act on it during reasoning.
Abstract（参考訳）: 大規模な言語モデル(LLM)は、しばしば自己監視の有用な信号を公開する: 問題を解く前に、彼らが成功する確率を見積もることができ、それを解決すると、答えが正しいかどうかを判断できる。しかし、これらの信号は通常、推論を制御するために使われるのではなく、独立して測定または引き起こされる。本研究では,LSMが有効なテストタイム制御に変換できる潜在メタ認知能力を持っているかどうかを問う。認知心理学からネルソン=ナレンズ理論に触発された我々は、モニタリングと推論を分離するメタ認知的ハーネスを提案する。各問題に対して、モデルがまず、事前解答感覚(FOK)信号を報告し、各解答が試みられた後、JOL(Post-solve judgment-of-learning)信号を報告する。これらの信号を受動的信頼推定として扱う代わりに、ハーネスはそれらを推論のための明示的な制御インターフェースへと変換する。それは、現在のソリューションをいつ信頼するか、いつよりコンパクトなメタ認知フィードバックで再試行するか、そして最終的なアグリゲータに複数の試みをパスするかを決定する。テキスト,コード,マルチモーダル推論ベンチマークを通じて,パラメータ更新やベンチマーク固有の微調整を行わず,固定された Claude Sonnet-4.6 ベースモデルを大幅に改善した。評価された公開ベンチマークのスナップショットでは、プールされた精度が48.3から56.9に上昇し、HLE-Verified、LiveCodeBench v6、R-Bench-Vの3つの主要な評価設定で最上位のリーダーボードエントリを上回っている。これらの結果から,強いLDMはメタ認知能力を有するが,推論時に作用するためには明確な制御力が必要であることが示唆された。

論文の概要: LLMs Know When They Know, but Do Not Act on It: A Metacognitive Harness for Test-time Scaling

関連論文リスト