Fugu-MT 論文翻訳(概要): LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks

論文の概要: LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks

arxiv url: http://arxiv.org/abs/2606.03303v2
Date: Wed, 03 Jun 2026 06:16:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 13:59:43.535361
Title: LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks
Title（参考訳）: LEAP: エージェントフレームワークによる形式数学用LLMのスーパーチャージ
Authors: Po-Nien Kung, Linfeng Song, Dawsen Hwang, Jinsung Yoon, Chun-Liang Li, Simone Severini, Mirek Olšák, Edward Lockhart, Quoc V Le, Burak Gokturk, Thang Luong, Tomas Pfister, Nanyun Peng,
Abstract要約: 大規模言語モデル(LLM)は、強力な非公式な数学的推論を示すが、リーンのような形式言語で検証可能な証明を生成するのに苦労している。本稿では,汎用基礎モデルによる自動形式定理証明の最先端性能を実現するためのエージェントフレームワークであるLEAPを提案する。
参考スコア（独自算出の注目度）: 85.86474267842907
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) exhibit strong informal mathematical reasoning but struggle to generate mechanically verifiable proofs in formal languages like Lean. We present LEAP, an agentic framework that enables general-purpose foundation models to achieve state-of-the-art performance on automated formal theorem proving. LEAP leverages foundation model capabilities, such as informal reasoning, instruction following, and iterative self-refinement. By decomposing complex problems into smaller units, the system bridges formal proof construction with informal blueprints through continuous interaction with the Lean compiler. To provide a rigorous evaluation beyond increasingly saturated benchmarks, we introduce Lean-IMO-Bench, a benchmark of IMO-style problems formalized in Lean, with short statements yet highly non-routine and multi-step proofs across a wide range of difficulty levels. Empirically, on the latest 2025 Putnam Competition, an annual mathematics competition for undergraduate students in North America, LEAP solves all 12 problems, matching recent breakthroughs by frontier formal mathematical models. On Lean-IMO-Bench, LEAP boosts the one-shot formal solve rate of general-purpose LLMs from below 10% to 70%, notably surpassing the 48% benchmark set by a specialized, gold-medal-caliber IMO system. Furthermore, we demonstrate LEAP's research-level utility by autonomously formalizing complex proofs for open combinatorial challenges, including a verified proof for a key subproblem in Knuth's Hamiltonian decomposition of even-order Cayley graphs.
Abstract（参考訳）: 大規模言語モデル(LLM)は、強力な非公式な数学的推論を示すが、リーンのような形式言語で機械的に検証可能な証明を生成するのに苦労している。本稿では,汎用基礎モデルによる自動形式定理証明の最先端性能を実現するためのエージェントフレームワークであるLEAPを提案する。 LEAPは、非公式な推論、命令フォロー、反復的な自己修正などの基礎モデル機能を活用する。複雑な問題を小さなユニットに分解することで、システムはリーンコンパイラとの継続的な相互作用を通じて非公式な青写真で形式的な証明構造を橋渡しします。飽和度の高いベンチマークを超えて厳密な評価を行うため、リーンで形式化されたIMOスタイルの問題のベンチマークであるLean-IMO-Benchを紹介します。 2025年、北米の大学生を対象とした毎年恒例の数学コンペティションであるパットナム・コンペティションで、LEAPは12の問題を全て解決し、フロンティアの公式な数学的モデルによる最近のブレークスルーと一致する。 Lean-IMO-Benchでは、LEAPが1ショットの汎用LLMの正式な解決率を10%以下から70%に引き上げている。さらに、開組合せ問題に対する複素証明を自律的に形式化し、クヌースの等階ケイリーグラフのハミルトン分解における鍵部分確率の証明を含むLEAPの研究レベルユーティリティを実証する。

論文の概要: LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks

関連論文リスト