Fugu-MT 論文翻訳(概要): Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

論文の概要: Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

arxiv url: http://arxiv.org/abs/2508.03613v1
Date: Tue, 05 Aug 2025 16:28:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-06 18:18:56.073522
Title: Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
Title（参考訳）: Goedel-Prover-V2:Scaffoldedデータ合成と自己補正による形式理論のスケーリング
Authors: Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, Jiayun Wu, Jiri Gesi, Ximing Lu, David Acuna, Kaiyu Yang, Hongzhou Lin, Yejin Choi, Danqi Chen, Sanjeev Arora, Chi Jin,
Abstract要約: 一連のオープンソースの言語モデルであるGoedel-Prover-V2は、自動定理の新たな最先端を証明した。我々は、より複雑な定理をマスターするためにモデルを訓練することの困難さを増す合成タスクを生成する。 Goedel-Prover-V2-32Bは、標準モードのpass@32でMiniF2Fの88.1%、自己補正モードの90.4%を達成する。
参考スコア（独自算出の注目度）: 95.91743732150233
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: We introduce Goedel-Prover-V2, a series of open-source language models that set a new state-of-the-art in automated theorem proving. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) Scaffolded data synthesis: We generate synthetic tasks of increasing difficulty to train the model to master increasingly complex theorems; (2) Verifier-guided self-correction: We enable the model to iteratively revise its proofs by leveraging feedback from the Lean compiler; (3) Model averaging: We merge model checkpoints to mitigate the decrease in model output diversity in later stages of training. Our small model, Goedel-Prover-V2-8B, reaches 84.6% pass@32 on MiniF2F and outperforms DeepSeek-Prover-V2-671B under the same metric, despite being 80X smaller. Our flagship model, Goedel-Prover-V2-32B, achieves 88.1% on MiniF2F at pass@32 in standard mode and 90.4% in self-correction mode, outperforming prior SOTA by a large margin. Additionally, our flagship model solves 86 problems on PutnamBench at pass@184, securing the first place among open-source models on the leaderboard, surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by pass@1024 with a significantly smaller model size and compute budget. At the time of its release (July-August 2025), Goedel-Prover-V2 achieves the strongest overall performance among all open-source theorem provers. It also ranks among the top-performing models--including closed-source systems with publicly reported performance--under a constrained test-time compute budget. Our models, code, and data are released at https://github.com/Goedel-LM/Goedel-Prover-V2.
Abstract（参考訳）: Goedel-Prover-V2は、一連のオープンソースの言語モデルであり、自動定理証明における新しい最先端技術を確立している。標準のエキスパートイテレーションと強化学習パイプラインに基づいて構築されたアプローチでは,3つの重要なイノベーションが組み込まれています。(1) 共有データ合成: より複雑な定理を習得するためにモデルをトレーニングすることの困難さを増す合成タスク,(2) 検証ガイドによる自己補正: リーンコンパイラからのフィードバックを活用して,モデルを反復的に修正可能にすること,(3) モデル平均化: モデルの出力多様性の低下を軽減するために,モデルチェックポイントをマージする,という方法です。私たちの小さなモデルであるGoedel-Prover-V2-8Bは、MiniF2Fで84.6%のpass@32に達し、80倍小さいにもかかわらず、DeepSeek-Prover-V2-671Bより優れています。我々のフラッグシップモデルであるGoedel-Prover-V2-32Bは、標準モードではpass@32でMiniF2Fで88.1%、自己補正モードでは90.4%を達成し、SOTAよりも大きなマージンで上回っている。さらに、当社のフラッグシップモデルはPatnamBenchのpass@184で86の問題を解決し、DeepSeek-Prover-V2-671Bによる、モデルサイズと計算予算が大幅に小さくなった47の問題を解決するという、DeepSeek-Prover-V2-671Bの記録を破って、リーダボード上のオープンソースモデルの中で第1位を確保しました。リリース(2025年7月から8月)の時点で、Goedel-Prover-V2 はすべてのオープンソース定理プローバーの中で最も高い総合的な性能を達成している。また、制限されたテストタイムの計算予算の下で、パフォーマンスを公表したクローズドソースシステムを含む、最高のパフォーマンスモデルにもランク付けしている。私たちのモデル、コード、データはhttps://github.com/Goedel-LM/Goedel-Prover-V2でリリースされます。

論文の概要: Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

関連論文リスト