Fugu-MT 論文翻訳(概要): Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification

論文の概要: Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification

arxiv url: http://arxiv.org/abs/2603.19329v1
Date: Wed, 18 Mar 2026 18:42:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 19:48:38.802455
Title: Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification
Title（参考訳）: Goedel-Code-Prover: オープンステート・オブ・ザ・アートコードの検証のための階層的証明検索
Authors: Zenan Li, Ziran Yang, Deyuan, He, Haoyu Zhao, Andrew Zhao, Shange Tang, Kaiyu Yang, Aarti Gupta, Zhendong Su, Chi Jin,
Abstract要約: 大規模言語モデル(LLM)は可塑性コードを生成することができるが、正確性には限界がある。本稿では,Lean4における自動コード検証のための階層的証明検索フレームワークを提案する。 Goedel-Code-Prover-8Bは、分解と完了の両方のための単一の統一ポリシーです。
参考スコア（独自算出の注目度）: 34.98335927187393
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) can generate plausible code but offer limited guarantees of correctness. Formally verifying that implementations satisfy specifications requires constructing machine-checkable proofs, a task that remains beyond current automation. We propose a hierarchical proof search framework for automated code verification in Lean~4 that decomposes complex verification goals into structurally simpler subgoals before attempting tactic-level proving. Central to our approach is a principled decomposition score that combines constructive justification with structural effectiveness. Crucially, this score serves as both the training reward and the inference-time ranking criterion, ensuring strict alignment between optimization and deployment. We train Goedel-Code-Prover-8B, a single unified policy for both decomposition and completion, via supervised initialization followed by hybrid reinforcement learning, where a continuous decomposition reward drives planning exploration while supervised replay stabilizes proof generation. On three Lean-based code verification benchmarks comprising 427 tasks, our 8B-parameter model achieves a 62.0\% prove success rate, a 2.6$\times$ improvement over the strongest baseline, surpassing neural provers up to 84$\times$ larger. We further observe consistent inference-time scaling: success rates improve monotonically with search iterations and sampling budget, with our trained model achieving greater efficiency than frontier off-the-shelf models of comparable scale.
Abstract（参考訳）: 大規模言語モデル(LLM)は可塑性コードを生成することができるが、正確性には限界がある。実装が仕様を満たすことを正式に検証するには、マシンチェック可能な証明を構築する必要がある。戦術レベルの証明を試みる前に、複雑な検証目標を構造的に単純なサブゴールに分解する、Lean~4における自動コード検証のための階層的な証明検索フレームワークを提案する。我々のアプローチの中心は、構成的正当化と構造的有効性を組み合わせた、原理化された分解スコアである。重要な点として、このスコアはトレーニング報酬と推論時間ランキングの基準の両方として機能し、最適化とデプロイメントの厳格な整合性を保証する。我々は、教師付き初期化とハイブリッド強化学習を併用し、教師付きリプレイによる証明生成の安定化を図りつつ、計画探索を継続分解報酬で進める、単一の統合ポリシであるGoedel-Code-Prover-8Bを訓練する。 427のタスクからなる3つのリーンベースのコード検証ベンチマークでは、私たちの8Bパラメーターモデルは、成功率を62.0\%達成し、最強のベースラインよりも2.6$\times$改善し、最大84$\times$以上のニューラルプロバーを上回ります。成功率は探索反復とサンプリング予算によって単調に改善され、トレーニングされたモデルは、比較可能なスケールのフロンティアオフザシェルフモデルよりも高い効率を達成する。

論文の概要: Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification

関連論文リスト