Fugu-MT 論文翻訳(概要): Mathematical Reasoning via Intervention-Based Time-Series Causal Discovery Using LLMs as Concept Mastery Simulators

論文の概要: Mathematical Reasoning via Intervention-Based Time-Series Causal Discovery Using LLMs as Concept Mastery Simulators

arxiv url: http://arxiv.org/abs/2605.07600v1
Date: Fri, 08 May 2026 11:19:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:39.013937
Title: Mathematical Reasoning via Intervention-Based Time-Series Causal Discovery Using LLMs as Concept Mastery Simulators
Title（参考訳）: LLMを概念熟達シミュレータとして用いたインターベンションに基づく時系列因果発見による数学的推論
Authors: Tsuyoshi Okita,
Abstract要約: LLM自体を介入シミュレータとして利用するCIKA(Causal Intervention for Knowledge Activation)を提案する。プロンプトは概念状態をマスターされた状態に設定し、正しさの変化は因果効果を推定する。我々は、この量をICP(Interventional Capability Probe)として定式化し、LLMが与えられた概念を利用できるかどうかを診断する。
参考スコア（独自算出の注目度）: 2.0013177824602444
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent methods for improving LLM mathematical reasoning, whether through MCTS-based test-time search or causal graph-guided knowledge injection, cannot identify which concepts causally contribute to a correct answer, as the observed association may be spurious, driven by confounders such as problem difficulty. We propose CIKA (Causal Intervention for Knowledge Activation), a framework that uses the LLM itself as an interventional simulator: a prompt sets the concept state to ``mastered'' and the correctness change estimates the causal effect. We formalize this quantity as an Interventional Capability Probe (ICP), which diagnoses whether the LLM can use a given concept -- distinct from merely possessing knowledge. Because the intervention exogenously sets the concept state independently of problem difficulty, ICP separates confounding that observational methods cannot. On 67 screened problems, the ICP of the top-ranked concept (+0.219) is significantly larger than that of the negative control (+0.039; paired $t$-test, $p < 10^{-6}$, Cohen's $d = 0.86$), confirming that the probe discriminates causally relevant concepts from irrelevant ones. Analysis of 601 Omni-MATH problems further shows that solved problems have 6.1$\times$ higher ATE than unsolved ones (0.338 vs. 0.055), confirming that ICP is predictive of problem-solving success. With a 7B-parameter LLM whose weights are entirely frozen, CIKA achieves 69.7\% on the contamination-free Omni-MATH-Rule benchmark and 64.0\% overall, compared to 60.5\% for o1-mini, and 97.2\% on GSM8K, 46--50\% on AIME 2024--2026, and 46.2\% on MathArena. The Causal Knowledge Activation component contributes 33.8\% of correct answers on problems where the base model alone fails, demonstrating that the LLM already possessed but had not activated the requisite knowledge.
Abstract（参考訳）: MCTSベースのテストタイムサーチや因果グラフ誘導知識注入を通しても,どの概念が正しい回答に因果的に寄与するかは特定できない。本稿では,CIKA(Causal Intervention for Knowledge Activation)を提案する。CIKA(Causal Intervention for Knowledge Activation)は,LSM自体を介入シミュレータとして使用するフレームワークである。我々は、この量をICP(Interventional Capability Probe)として定式化し、LLMが与えられた概念を使えるかどうかを診断する。介入は問題困難から独立して概念状態を設定するため、ICPは観察方法ができないという誤解を分離する。 67のスクリーニング問題では、トップランクの概念(+0.219)のICPは負の制御(+0.039; paired $t$-test, $p < 10^{-6}$, Cohen's $d = 0.86$)よりもかなり大きく、プローブが無関係な概念と因果関係な概念を識別することを確認した。 601 Omni-MATH問題の解析により、未解決問題よりも6.1$\times$ ATEが高い(0.338 vs. 0.055)ことが示され、ICPが問題解決の成功を予測することが確認された。重量が完全に凍結された7BパラメータのLLMでは、CIKAは汚染のないOmni-MATH-Ruleベンチマークで69.7\%、O1-miniでは64.0\%、GSM8Kでは97.2\%、AIME 2024-2026では46-50\%、MathArenaでは46.2\%である。因果的知識活性化(Causal Knowledge Activation)コンポーネントは、ベースモデルのみが失敗する問題に対する正しい回答の33.8 %を貢献し、LCMが既に持っているが、必要な知識を活性化していなかったことを証明している。

論文の概要: Mathematical Reasoning via Intervention-Based Time-Series Causal Discovery Using LLMs as Concept Mastery Simulators

関連論文リスト