Fugu-MT 論文翻訳(概要): Artificial Intelligence for Mathematical Reasoning: An Integrated Survey of Language Models, Neuro-symbolic Systems, and Verified Discovery

論文の概要: Artificial Intelligence for Mathematical Reasoning: An Integrated Survey of Language Models, Neuro-symbolic Systems, and Verified Discovery

arxiv url: http://arxiv.org/abs/2606.08728v1
Date: Sun, 07 Jun 2026 16:50:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.418778
Title: Artificial Intelligence for Mathematical Reasoning: An Integrated Survey of Language Models, Neuro-symbolic Systems, and Verified Discovery
Title（参考訳）: 数学的推論のための人工知能:言語モデル、ニューロシンボリックシステム、検証された発見の総合的な調査
Authors: Syed Rifat Raiyan, Mohsinul Kabir, Hasan Mahmud, Md Kamrul Hasan,
Abstract要約: この調査は、この分野の進化の統一的な説明を提供する。 i)テキストや図面に対する非公式な推論、MWPの解法、マルチモーダル幾何、RLMの解法、(ii)自動形式化、戦術予測、コンパイラ誘導修理、証明探索を含む証明アシスタントの形式的推論、(iii)システム構築の提案、境界の改善、オープンな問題に対する攻撃を支援する数学的発見、(iv)CoTプロンプト、ツールの使用、プロセス報酬モデル、VLVRなど、推論およびトレーニング時のテクニック。
参考スコア（独自算出の注目度）: 3.9943798586374784
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Mathematical reasoning has long served as a stringent test of machine intelligence; over the past decade, it has moved from a niche problem within NLP to one of the most consequential AI frontiers. This survey provides a unified account of the field's evolution, from early rule-based math word problem (MWP) solvers and template-driven geometry systems, through neural expression generation and LLM prompting, to contemporary reasoning models, multi-agent systems, neuro-symbolic theorem provers, and verified discovery workflows. We organize the landscape along four axes: (i) informal reasoning over text and diagrams, spanning MWP solving, multimodal geometry, and VLMs; (ii) formal reasoning in proof assistants, including autoformalization, tactic prediction, compiler-guided repair, and proof search; (iii) mathematical discovery, where systems propose constructions, improve bounds, or assist attacks on open problems; and (iv) the inference and training-time techniques, including CoT prompting, tool use, process reward models, and RLVR, that increasingly connect generation with verification. We catalog major benchmarks across grade-school arithmetic, competition mathematics, geometry, formal proving, multimodal and multilingual reasoning, and expert evaluation, and we examine benchmark saturation, contamination, reporting mismatches, and the distinction between pass@1, majority voting, and verifier-assisted pass@$k$. We critically assess failure modes: brittleness under perturbation, reward hacking, multimodal grounding failures, fragile formalization, and the energy cost of reasoning-scale inference. Drawing on recent perspectives from working mathematicians, we identify future directions centered on verified-discovery workflows, reasoning efficiency, and infrastructure to make AI-assisted formalization broadly usable. Companion materials: https://github.com/Starscream-11813/awesome-AI4Math.
Abstract（参考訳）: 過去10年間、NLP内のニッチな問題から、最も重要なAIフロンティアの1つに移行してきた。このサーベイは、初期のルールベースの数学語問題(MWP)の解法とテンプレート駆動幾何システムから、ニューラル表現の生成とLLMプロンプト、現代の推論モデル、マルチエージェントシステム、ニューロシンボリック定理プロバー、検証された発見ワークフローまで、分野の進化の統一的な説明を提供する。私たちは4つの軸に沿って風景を整理します。一テキスト及び図面の非公式な推論、MWP解決、マルチモーダル幾何学及びVLM 二自動書式化、戦術予測、コンパイラ誘導修理及び証明探索を含む証明助手の形式的推論三数学的な発見で、システムが建設を提案し、限界を改善し、又はオープンな問題に対する攻撃を支援すること。 (4)CoTプロンプト,ツール使用,プロセス報酬モデル,RLVRなど,世代と検証を結び付ける推論とトレーニング時間のテクニック。我々は,小学生算術,競争数学,幾何,形式証明,マルチモーダルおよび多言語推論,専門家評価の各ベンチマークをカタログ化し,ベンチマーク飽和,汚染,報告ミスマッチ,パス@1,多数決,検証支援パス@$k$の区別について検討する。我々は、摂動下での脆さ、報酬ハッキング、マルチモーダルグラウンド障害、脆弱な形式化、推論スケール推論のエネルギーコストなど、障害モードを批判的に評価する。作業数学者の最近の見解に基づいて、検証済みのワークフロー、推論効率、そしてAI支援の形式化を広く利用できるようにするインフラに焦点を絞った将来の方向性を特定する。関連資料:https://github.com/Starscream-11813/awesome-AI4Math

論文の概要: Artificial Intelligence for Mathematical Reasoning: An Integrated Survey of Language Models, Neuro-symbolic Systems, and Verified Discovery

関連論文リスト