Fugu-MT 論文翻訳(概要): Reducing the Costs of Proof Synthesis on Rust Systems by Scaling Up a Seed Training Set

論文の概要: Reducing the Costs of Proof Synthesis on Rust Systems by Scaling Up a Seed Training Set

arxiv url: http://arxiv.org/abs/2602.04910v1
Date: Wed, 04 Feb 2026 01:04:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-06 18:49:08.532745
Title: Reducing the Costs of Proof Synthesis on Rust Systems by Scaling Up a Seed Training Set
Title（参考訳）: シードトレーニングセットのスケールアップによるラストシステムにおける証明合成コストの削減
Authors: Nongyu Di, Tianyu Chen, Shan Lu, Shuai Lu, Yeyun Gong, Peng Cheng, Jacob R. Lorch, Yuan Yao, Xiaoxing Ma,
Abstract要約: 本稿では,Rustで記述されたシステムソフトウェアの検証ツールであるVerusのデータ合成パイプラインであるVeruSynを紹介する。 690万のRustプログラムで、それぞれが正式な仕様と、それがその仕様を満たしている証拠を持って、最大のVerus検証プログラムを合成します。このデータセットによって、コスト対効果の高いトレードオフを備えた微調整のQwen2.5-Coder-32B-Instructモデルが作成できます。
参考スコア（独自算出の注目度）: 40.85677634306877
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are widely used for code generation. However, the correctness of code generated by LLMs remains a concern. A potential remedy to this concern is to have LLMs generate formal correctness proofs along with such code. However, compared with code generation, code-proof generation requires much higher reasoning capability and has much less existing data to learn from. In this paper, we present VeruSyn, a data synthesis pipeline for Verus, a state-of-the-art verification tool for system software written in Rust. Through self-synthesis and tutorial-based synthesis, VeruSyn achieves much larger scale and Verus-feature coverage than previous data-synthesis techniques designed for Verus; VeruSyn also supplements its dataset with long-chain-of-thought (CoT) data through agent trajectory synthesis. With VeruSyn, we synthesize the largest set of Verus verified programs: 6.9 million Rust programs, each with a formal specification and a proof that it meets that specification. This dataset lets us create a fine-tuned Qwen2.5-Coder-32B-Instruct model with appealing cost-proof tradeoff compared with state-of-the-art commercial models like Claude Sonnet 4.5. It also significantly outperforms models like o4-mini and previously proposed research models.
Abstract（参考訳）: 大規模言語モデル(LLM)はコード生成に広く使われている。しかし、LLMが生成したコードの正確性は依然として懸念されている。この懸念に対する潜在的対策は、LSMがそのようなコードとともに正式な正当性証明を生成することである。しかし、コード生成と比較して、コード保護生成は推論能力がはるかに高く、そこから学ぶべき既存のデータが少ない。本稿では,Rustで記述されたシステムソフトウェアのための最先端の検証ツールであるVerusのデータ合成パイプラインであるVeruSynを紹介する。自己合成とチュートリアルベースの合成を通じて、VeruSynは、従来のVerus用に設計されたデータ合成技術よりもはるかに大きなスケールとVerus-Featureカバレッジを実現している。 VeruSynでは690万のRustプログラムという,最大の検証プログラムセットを合成しています。このデータセットは、Claude Sonnet 4.5のような最先端の商用モデルと比較して、コスト対効果の高いトレードオフを備えた微調整のQwen2.5-Coder-32B-Instructモデルを作成することができます。また、o4-miniや以前に提案された研究モデルなどのモデルも大幅に上回っている。

論文の概要: Reducing the Costs of Proof Synthesis on Rust Systems by Scaling Up a Seed Training Set

関連論文リスト