Fugu-MT 論文翻訳(概要): VeruSAGE: A Study of Agent-Based Verification for Rust Systems

論文の概要: VeruSAGE: A Study of Agent-Based Verification for Rust Systems

arxiv url: http://arxiv.org/abs/2512.18436v1
Date: Sat, 20 Dec 2025 17:22:52 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-23 18:54:32.33283
Title: VeruSAGE: A Study of Agent-Based Verification for Rust Systems
Title（参考訳）: VeruSAGE: Rustシステムのエージェントベースの検証に関する研究
Authors: Chenyuan Yang, Natalie Neamtu, Chris Hawblitzel, Jacob R. Lorch, Shan Lu,
Abstract要約: 本稿では,Rustで記述されたシステムソフトウェアに対する正当性証明を開発するために,大規模言語モデルの能力に関する総合的研究を行う。 VeruSAGE-Benchは8つのオープンソースのVerus検証Rustシステムから抽出された849の証明タスクで構成される。本研究は,VeruSAGE-Benchにおけるシステム検証作業の80%以上をLLM-agentの組み合わせで行う。
参考スコア（独自算出の注目度）: 3.268641886777319
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have shown impressive capability to understand and develop code. However, their capability to rigorously reason about and prove code correctness remains in question. This paper offers a comprehensive study of LLMs' capability to develop correctness proofs for system software written in Rust. We curate a new system-verification benchmark suite, VeruSAGE-Bench, which consists of 849 proof tasks extracted from eight open-source Verus-verified Rust systems. Furthermore, we design different agent systems to match the strengths and weaknesses of different LLMs (o4-mini, GPT-5, Sonnet 4, and Sonnet 4.5). Our study shows that different tools and agent settings are needed to stimulate the system-verification capability of different types of LLMs. The best LLM-agent combination in our study completes over 80% of system-verification tasks in VeruSAGE-Bench. It also completes over 90% of a set of system proof tasks not part of VeruSAGE-Bench because they had not yet been finished by human experts. This result shows the great potential for LLM-assisted development of verified system software.
Abstract（参考訳）: 大規模言語モデル(LLM)は、コードを理解して開発する素晴らしい能力を示している。しかしながら、コードの正確性について厳格に推論し、証明する能力については、疑問が残る。本稿では,Rustで記述されたシステムソフトウェアの正当性証明を開発するLLMの能力を総合的に研究する。 VeruSAGE-Benchは8つのオープンソースのVerus検証Rustシステムから抽出された849の証明タスクで構成される。さらに,異なるLLM(o4-mini, GPT-5, Sonnet 4, Sonnet 4.5)の強度と弱点に適合するエージェントシステムを設計した。本研究は,異なるタイプのLLMのシステム検証能力を刺激するために,異なるツールとエージェント設定が必要であることを示す。本研究は,VeruSAGE-Benchにおけるシステム検証作業の80%以上をLLM-agentの組み合わせで行う。また、VeruSAGE-Bench以外の一連のシステム証明タスクの90%以上は、まだ人間の専門家によって完了していないため完了している。この結果から,LLMを利用したシステムソフトウェアの開発の可能性が示唆された。

論文の概要: VeruSAGE: A Study of Agent-Based Verification for Rust Systems

関連論文リスト