Fugu-MT 論文翻訳(概要): Neural Theorem Proving for Verification Conditions: A Real-World Benchmark

論文の概要: Neural Theorem Proving for Verification Conditions: A Real-World Benchmark

arxiv url: http://arxiv.org/abs/2601.18944v2
Date: Wed, 28 Jan 2026 18:25:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-29 15:46:06.579721
Title: Neural Theorem Proving for Verification Conditions: A Real-World Benchmark
Title（参考訳）: 検証条件のニューラルネットワークによる検証 - 実世界のベンチマーク
Authors: Qiyuan Xu, Xiaokun Luan, Renxi Wang, Joshua Ong Jun Leang, Peixin Wang, Haonan Li, Wenda Li, Conrad Watt,
Abstract要約: この研究は、NTP4VC(Neural Theorem Proving for Verification Conditions)を導入し、このタスクのための最初の実世界のマルチ言語ベンチマークを示す。 NTP4VC を用いて,大言語モデル (LLM) の評価を行った。
参考スコア（独自算出の注目度）: 9.350519191460018
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Theorem proving is fundamental to program verification, where the automated proof of Verification Conditions (VCs) remains a primary bottleneck. Real-world program verification frequently encounters hard VCs that existing Automated Theorem Provers (ATPs) cannot prove, leading to a critical need for extensive manual proofs that burden practical application. While Neural Theorem Proving (NTP) has achieved significant success in mathematical competitions, demonstrating the potential of machine learning approaches to formal reasoning, its application to program verification--particularly VC proving--remains largely unexplored. Despite existing work on annotation synthesis and verification-related theorem proving, no benchmark has specifically targeted this fundamental bottleneck: automated VC proving. This work introduces Neural Theorem Proving for Verification Conditions (NTP4VC), presenting the first real-world multi-language benchmark for this task. From real-world projects such as Linux and Contiki-OS kernel, our benchmark leverages industrial pipelines (Why3 and Frama-C) to generate semantically equivalent test cases across formal languages of Isabelle, Lean, and Rocq. We evaluate large language models (LLMs), both general-purpose and those fine-tuned for theorem proving, on NTP4VC. Results indicate that although LLMs show promise in VC proving, significant challenges remain for program verification, highlighting a large gap and opportunity for future research.
Abstract（参考訳）: 定理証明は、検証条件の自動証明(VC)が主要なボトルネックであるプログラム検証の基礎となる。実世界のプログラム検証は、既存のAutomated Theorem Provers(ATP)が証明できないハードVCに頻繁に遭遇する。 Neural Theorem Proving (NTP) は数学の競争で大きな成功を収め、フォーマルな推論への機械学習アプローチの可能性を示す一方で、その検証プログラムへの応用(特にVCの証明)は、ほとんど探索されていない。アノテーション合成と検証関連の定理証明に関する既存の研究にもかかわらず、この基本的なボトルネックを特に狙うベンチマークは、特にない。この研究は、NTP4VC(Neural Theorem Proving for Verification Conditions)を導入し、このタスクのための最初の実世界のマルチ言語ベンチマークを示す。 LinuxやContiki-OSカーネルのような実世界のプロジェクトから、我々のベンチマークは産業用パイプライン(Why3とFrama-C)を利用して、Isabelle、Lean、Rocqといった形式言語で意味的に等価なテストケースを生成する。 NTP4VC を用いて,大言語モデル (LLM) の評価を行った。結果は、LCMはVCの実証に有望であることを示しているが、プログラム検証には大きな課題が残っており、将来の研究における大きなギャップと機会が浮かび上がっていることを示している。

論文の概要: Neural Theorem Proving for Verification Conditions: A Real-World Benchmark

関連論文リスト