Fugu-MT 論文翻訳(概要): Automating Formal Verification with Agent-Guided Tree Search

論文の概要: Automating Formal Verification with Agent-Guided Tree Search

arxiv url: http://arxiv.org/abs/2605.27485v1
Date: Tue, 26 May 2026 14:50:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:55.368628
Title: Automating Formal Verification with Agent-Guided Tree Search
Title（参考訳）: エージェントガイド木探索による形式検証の自動化
Authors: Leo Yao,
Abstract要約: 形式的検証は、ソフトウェアを確実に修正する道を提供するが、検証済みのコードを書くのに十分な費用がかかるため、本番ではほとんど使われない。最近のベンチマークでは、仕様をコードに変換する能力と、マシンチェックによる正確さの反復を計測している。この論文は、リーンにおけるLCM駆動による検証コード生成の状況を評価し、パフォーマンスを改善するための検索ベースの手法を開発する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Formal verification offers a path to provably correct software, but writing verified code remains expensive enough that the technique is rarely used in production. Recent large language models can accelerate this work, and recent benchmarks measure their ability to translate specifications into code and machine-checked proofs of correctness. This thesis evaluates the state of such LLM-driven verified-code generation ("vericoding") in Lean and develops search-based methods for improving verification performance. We first reproduce a subset of the vericoding-benchmark Lean leaderboard on a current cross-vendor model pool, finding that non-reasoning performance remains roughly steady on US closed-source models while open-weight models have slightly improved. We update the iterative methodology of vericoding-benchmark with an agentic loop equipped with mathlib search, finding that model performance greatly improves and scales with agent budget. GPT-5.4 nearly saturates the benchmark at 95.0% on 423 specs with $K=50$ LLM calls. We then design two agent-directed tree-search formulations: a state-based orchestrator that branches on partial-proof states, and a context-based orchestrator that branches on full subagent contexts. Compared against the agent baseline, the context-based design solves a wider range of intermediate-difficulty specs at lower token cost, while the agent baseline retains an advantage on the hardest specs, where uninterrupted iteration matters most. We conclude that search structure has selective advantages over a strong agent baseline, and that more challenging benchmarks drawn from modern code are important to measure and drive further progress in automated formal verification. Code available upon request by contacting the author at leoy@mit.edu.
Abstract（参考訳）: 形式的検証は、ソフトウェアを確実に修正する道を提供するが、検証済みのコードを書くのに十分な費用がかかるため、本番ではほとんど使われない。最近の大規模言語モデルは、この作業を加速し、最近のベンチマークでは、仕様をコードに翻訳する能力と、正しさのマシンチェックされた証明を計測している。この論文は、リーンにおけるLCM駆動の検証コード生成("vericoding")の状態を評価し、検証性能を改善するための検索ベースの手法を開発する。私たちはまず、現在のクロスベンダモデルプール上で、ベリコードベンチマークのリーンリーダーボードのサブセットを再現し、非合理的なパフォーマンスが米国のクローズドソースモデルでほぼ安定しているのに対して、オープンウェイトモデルはわずかに改善されていることを発見した。我々は,モデル性能がエージェント予算で大幅に向上し,スケールすることが判明した,エージェントループによる検証ベンチマークの反復的手法を更新する。 GPT-5.4は423のスペックで95.0%とほぼ飽和している。次に、部分的な保護状態に枝分かれする状態ベースのオーケストレータと、完全なサブエージェントコンテキストに枝分かれするコンテキストベースのオーケストレータという、2つのエージェント指向ツリー検索の定式化を設計する。エージェントベースラインと比較して、コンテキストベースデザインはトークンコストの低い中間微分スペックの範囲を広く解決する一方、エージェントベースラインは、中断しないイテレーションが最も重要となる最も難しいスペックに有利なままである。我々は,探索構造が強力なエージェントベースラインに対して選択的に有利であること,また,最新のコードから抽出されたより困難なベンチマークが,自動形式検証のさらなる進歩を促進する上で重要であることを結論付けた。 leoy@mit.edu.comで著者に連絡し、リクエストに応じて利用できるコード。

論文の概要: Automating Formal Verification with Agent-Guided Tree Search

関連論文リスト