Fugu-MT 論文翻訳(概要): Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs

論文の概要: Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs

arxiv url: http://arxiv.org/abs/2508.15878v1
Date: Thu, 21 Aug 2025 14:15:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-25 16:42:36.14257
Title: Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs
Title（参考訳）: Leanが理論計算機科学と出会い - 形式的-非形式的ペアにおける定理証明のスケーラブルな合成
Authors: Terry Jingchen Zhang, Wenyuan Jiang, Rongchuan Liu, Yisong Wang, Junran Yang, Ning Wang, Nicole Ni, Yinya Huang, Mrinmaya Sachan,
Abstract要約: 本稿では、厳密な証明問題のスケーラブルな情報源として理論計算機科学(TCS)を活用することを提案する。本稿では,2つのTCS領域に対して,チューリング機械停止動作の証明を含むベイジービーバー問題(Busy Beaver problem)と,論理と算術の推論を組み合わせた混合ブール算術問題(Mixed Boolean Arithmetic problem)を提案する。我々のフレームワークは,並列形式 (Lean4) と非公式 (Markdown) 仕様で問題を自動生成し,検証問題を生成するスケーラブルなパイプラインを作成する。
参考スコア（独自算出の注目度）: 41.29431283264807
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Formal theorem proving (FTP) has emerged as a critical foundation for evaluating the reasoning capabilities of large language models, enabling automated verification of mathematical proofs at scale. However, progress has been constrained by limited datasets due to the high cost of manual curation and the scarcity of challenging problems with verified formal-informal correspondences. We propose leveraging theoretical computer science (TCS) as a scalable source of rigorous proof problems, where algorithmic definitions enable automated generation of arbitrarily many challenging theorem-proof pairs. We demonstrate this approach on two TCS domains: Busy Beaver problems, which involve proving bounds on Turing machine halting behavior, and Mixed Boolean Arithmetic problems, which combine logical and arithmetic reasoning. Our framework automatically synthesizes problems with parallel formal (Lean4) and informal (Markdown) specifications, creating a scalable pipeline for generating verified proof challenges. Evaluation on frontier models reveals substantial gaps in automated theorem proving: while DeepSeekProver-V2-671B achieves 57.5\% success on Busy Beaver problems, it manages only 12\% on Mixed Boolean Arithmetic problems. These results highlight the difficulty of long-form proof generation even for problems that are computationally easy to verify, demonstrating the value of TCS domains for advancing automated reasoning research.
Abstract（参考訳）: 形式定理証明(FTP)は、大規模言語モデルの推論能力を評価する上で重要な基礎として現れ、大規模な数学的証明の自動検証を可能にしている。しかし、手作業によるキュレーションのコストが高いことと、検証された形式的インフォーマルな対応に関する課題が不足しているため、限られたデータセットによって進歩は制限されている。本稿では、厳密な証明問題のスケーラブルな源として理論計算機科学(TCS)を活用することを提案する。本稿では,2つのTCS領域に対して,チューリング機械停止動作の証明を含むベイジービーバー問題(Busy Beaver problem)と,論理と算術の推論を組み合わせた混合ブール算術問題(Mixed Boolean Arithmetic problem)を提案する。我々のフレームワークは,並列形式 (Lean4) と非公式 (Markdown) 仕様で問題を自動生成し,検証問題を生成するスケーラブルなパイプラインを作成する。 DeepSeekProver-V2-671B は Busy Beaver 問題において 57.5\% の成功を収める一方、混合ブール算術問題では 12\% の差しか持たない。これらの結果から, 計算機的に検証が容易な問題であっても, 長期的証明生成の難しさを浮き彫りにし, 自動推論研究の進歩に向けたTCS領域の価値を実証した。

論文の概要: Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs

関連論文リスト