Fugu-MT 論文翻訳(概要): DAG-Math: Graph-Guided Mathematical Reasoning in LLMs

論文の概要: DAG-Math: Graph-Guided Mathematical Reasoning in LLMs

arxiv url: http://arxiv.org/abs/2510.19842v1
Date: Sun, 19 Oct 2025 21:05:17 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:16.323897
Title: DAG-Math: Graph-Guided Mathematical Reasoning in LLMs
Title（参考訳）: DAG-Math:LLMにおけるグラフ誘導数理推論
Authors: Yuanhe Zhang, Ilja Kuzborskij, Jason D. Lee, Chenlei Leng, Fanghui Liu,
Abstract要約: 大型言語モデル (LLM) は, CoT (Chain-of-Thought) による数学的問題に対して高い性能を示す我々は、有向非巡回グラフ(DAG)上の一定の規則に基づくプロセスとしてCoTをモデル化することを提案する。ここでは,モデルのCoT軌道がDAG構造にどの程度よく依存するかを定量化する計量である論理的近接性を導入する。
参考スコア（独自算出の注目度）: 54.231935013127206
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) demonstrate strong performance on mathematical problems when prompted with Chain-of-Thought (CoT), yet it remains unclear whether this success stems from search, rote procedures, or rule-consistent reasoning. To address this, we propose modeling CoT as a certain rule-based stochastic process over directed acyclic graphs (DAGs), where nodes represent intermediate derivation states and edges encode rule applications. Within this framework, we introduce logical closeness, a metric that quantifies how well a model's CoT trajectory (i.e., the LLM's final output) adheres to the DAG structure, providing evaluation beyond classical PASS@k metrics. Building on this, we introduce the DAG-MATH CoT format and construct a benchmark that guides LLMs to generate CoT trajectories in this format, thereby enabling the evaluation of their reasoning ability under our framework. Across standard mathematical reasoning datasets, our analysis uncovers statistically significant differences in reasoning fidelity among representative LLM families-even when PASS@k is comparable-highlighting gaps between final-answer accuracy and rule-consistent derivation. Our framework provides a balance between free-form CoT and formal proofs systems, offering actionable diagnostics for LLMs reasoning evaluation. Our benchmark and code are available at: https://github.com/YuanheZ/DAG-MATH-Formatted-CoT.
Abstract（参考訳）: 大型言語モデル (LLM) は、Chain-of-Thought (CoT) によって引き起こされた数学的な問題に対して強い性能を示すが、この成功が探索、ルートプロシージャ、ルール一貫性推論に由来するかどうかは不明である。そこで我々は,ノードが中間導出状態とエッジを符号化して規則を符号化する有向非巡回グラフ(DAG)上の規則に基づく確率過程として,CoTをモデル化することを提案する。本フレームワークでは,従来のPASS@k測定値を超えて,モデルのCoT軌道(すなわちLLMの最終出力)がDAG構造にどの程度よく適合するかを定量化する計量である論理的クローズネスを導入する。そこで我々は,DAG-MATH CoT フォーマットを導入し,このフォーマットで LLM を誘導して CoT トラジェクトリを生成するベンチマークを構築した。標準的な数学的推論データセット全体では,PASS@kが最終回答精度と規則整合導出の差に匹敵するものであっても,代表LLMファミリー間の推論忠実度に統計的に有意な差が認められた。我々のフレームワークは、自由形式CoTと形式証明システムのバランスを提供し、LCMの推論評価のための実用的な診断を提供する。ベンチマークとコードは、https://github.com/YuanheZ/DAG-MATH-Formatted-CoT.comで公開されています。

論文の概要: DAG-Math: Graph-Guided Mathematical Reasoning in LLMs

関連論文リスト