Fugu-MT 論文翻訳(概要): Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential

論文の概要: Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential

arxiv url: http://arxiv.org/abs/2510.15216v2
Date: Mon, 20 Oct 2025 19:16:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:11.536221
Title: Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential
Title（参考訳）: 音質認識レベル:LDM共鳴電位を予測できる顕微鏡信号
Authors: Xuansheng Wu, Xiaoman Pan, Wenlin Yao, Jianshu Chen,
Abstract要約: 検証可能な報酬(RLVR)による強化学習は、大規模言語モデル(LLM)において強い推論をもたらす。私たちの重要な発見は、高ポテンシャルモデルが本質的に音質に敏感であることです。本稿では,これらの分布の分離を測定するためにJensen-Shannon Divergence を用いた顕微鏡計測法である Soundness-Aware Level (SAL) を紹介する。
参考スコア（独自算出の注目度）: 27.552392596027588
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Reinforcement learning with verifiable rewards (RLVR) can elicit strong reasoning in large language models (LLMs), while their performance after RLVR varies dramatically across different base models. This raises a fundamental question: what microscopic property of pre-trained models leads to this variation? To investigate, we formalize reasoning as chains of Horn clauses ("if-then" rules) built from features extracted from the LLM's latent space via cross-layer sparse autoencoders (SAEs). We estimate the transition probabilities between its features, and further categorize each rule by its semantic soundness level (e.g., strict, plausible, noisy) with an LLM. Our key discovery is that high-potential models are inherently soundness-aware: their internal probability distributions systematically shift across rules' soundness levels, becoming highly distinct for "strict" versus "noisy" rules. In contrast, weaker models are soundness-agnostic, collapsing to one distribution regardless of soundness levels. To quantify this, we introduce the Soundness-Aware Level (SAL), a microscopic metric using the Jensen-Shannon Divergence to measure the separation between these distributions. We show that SAL's predictions of post-RLVR reasoning performance follow a precise empirical law (R^2=0.87) across diverse model families (Qwen, Mistral, Llama, DeepSeek) and scales (0.5B-14B). This reveals that a model's reasoning potential is tied to its intrinsic, pre-trained ability to distinguish sound knowledge from unsound ones. These findings underscore the critical role of model pre-training in shaping reasoning and offer a practical metric grounded in the model's internal mechanisms for selecting/designing stronger base models.
Abstract（参考訳）: 検証可能な報酬付き強化学習(RLVR)は、大きな言語モデル(LLM)において強い推論をもたらすが、RLVR以降のパフォーマンスは、異なるベースモデル間で劇的に異なる。事前学習されたモデルの微視的特性が、この変化につながるのか? 本研究では,LLMの潜在空間から,多層スパースオートエンコーダ(SAE)を介して抽出された特徴から構築されたHhorn節(if-then規則)の連鎖として推論を定式化する。特徴間の遷移確率を推定し、各規則をLLMによる意味音性レベル(例えば、厳密性、可聴性、雑音性)で分類する。内部確率分布は規則の音質レベルを体系的にシフトし、規則の「制限」と「ノイズ」を区別する。対照的に、弱いモデルは音質に依存しないものであり、音質レベルに関係なく1つの分布に崩壊する。これを定量化するために,Jensen-Shannon Divergence を用いた顕微鏡計測法である Soundness-Aware Level (SAL) を導入する。 SALによるRLVR後推論性能の予測は、様々なモデルファミリー(Qwen, Mistral, Llama, DeepSeek)とスケール(0.5B-14B)にまたがる正確な経験則(R^2=0.87)に従っている。これは、モデルの推論能力が、その本質的で事前訓練された能力に結びついていることを明らかにする。これらの知見は, モデル事前学習が形状推論において重要な役割を担い, より強力なベースモデルを選択し, 設計するためのモデル内部機構に基礎を置く実用的な指標を提供する。

論文の概要: Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential

関連論文リスト