Fugu-MT 論文翻訳(概要): Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank

論文の概要: Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank

arxiv url: http://arxiv.org/abs/2510.24299v1
Date: Tue, 28 Oct 2025 11:01:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:37.086478
Title: Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank
Title（参考訳）: 相関行列ランクによる大言語モデルの推論経路の検証
Authors: Jiayu Liu, Wei Dai, Zhenya Huang, Ning Miao, Enhong Chen,
Abstract要約: 大規模言語モデル (LLM) は誤りや幻覚を引き起こす傾向がある。アウトプットを効果的かつ効率的にチェックする方法は、アプリケーションにとって重要な問題となっている。
参考スコア（独自算出の注目度）: 71.09032766271493
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the strong reasoning ability of large language models~(LLMs), they are prone to errors and hallucinations. As a result, how to check their outputs effectively and efficiently has become a critical problem in their applications. Existing checking methods heavily rely on external resources, such as trained verifiers (e.g., process/outcome reward models) or elaborate prompts, which lead to high computational overhead and are only applicable to specific domains. In this paper, we investigate whether the internal behaviors of LLMs have already implied the credibility of their reasoning paths. Specifically, we find that the rank of the correlation matrix between the input problem and the output reasoning path is a robust indicator of reasoning correctness. Different from other correctness indicators for LLMs, the calculation of the correlation matrix only relies on the LLM itself, which avoids the hassle of training a separate model or designing complicated prompts. Based on it, we design a simple, plug-and-play Self-Indicator method to reweight candidate reasoning paths, which achieves significant performance improvements than other voting and verification methods with very few computational overhead. Our experiments across multiple LLMs of varying scales and model families have further shown the effectiveness of Self-Indicator. It achieves over 75% accuracy in distinguishing correct reasoning paths from incorrect ones, and, in turn, improves the accuracies on three reasoning benchmarks by more than 8%.
Abstract（参考訳）: 大きな言語モデル~(LLM)の強い推論能力にもかかわらず、エラーや幻覚を起こしやすい。その結果,効率よく効率よくアウトプットをチェックする方法がアプリケーションにとって重要な問題となっている。既存のチェック手法は、訓練された検証者(例えば、プロセス/アウトカム報酬モデル)や精巧なプロンプトなど外部リソースに大きく依存しており、高い計算オーバーヘッドをもたらし、特定のドメインにのみ適用できる。本稿では, LLMの内部挙動がすでにその推論経路の信頼性を示唆しているかどうかを考察する。具体的には、入力問題と出力推論経路の相関行列のランクが、推論の正確性を示す頑健な指標であることが分かる。 LLMの他の正当性指標と異なり、相関行列の計算はLLM自体にのみ依存しており、異なるモデルのトレーニングや複雑なプロンプトの設計の面倒さを回避している。提案手法は,計算オーバーヘッドが少ない他の投票手法や検証手法に比べて,大幅な性能向上を実現し,予測経路の重み付けを行うための簡易なプラグアンドプレイセルフインデックス手法を設計する。様々なスケールのLLMとモデルファミリーを対象とした実験により,自己指標の有効性がさらに示された。正しい推論パスと間違った推論パスを区別する精度を75%以上達成し、3つの推論ベンチマークの精度を8%以上向上させる。

論文の概要: Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank

関連論文リスト