Fugu-MT 論文翻訳(概要): Neuro-Symbolic Software Verification: Hyper-charging Local Language Models with Symbolic Reasoning at Scale

論文の概要: Neuro-Symbolic Software Verification: Hyper-charging Local Language Models with Symbolic Reasoning at Scale

arxiv url: http://arxiv.org/abs/2606.16886v1
Date: Mon, 15 Jun 2026 15:59:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:34.74377
Title: Neuro-Symbolic Software Verification: Hyper-charging Local Language Models with Symbolic Reasoning at Scale
Title（参考訳）: ニューロ・シンボリック・ソフトウェア検証:大規模シンボリック推論を用いた超チャージローカル言語モデル
Authors: Muhammad A. A. Pirzada, Julian Parsert, Weiqi Wang, Konstantin Korovin, Lucas C. Cordeiro,
Abstract要約: 局所展開可能なオープンウェイト言語モデルとシンボリック不変量生成を組み合わせた,ニューロシンボリックパイプラインであるVerIbmcを提案する。すべての推論は、クラウドAPIやプロプライエタリモデルを必要としないオープンウェイトモデルを使用して、単一のローカルマシン上で完全に実行される。
参考スコア（独自算出の注目度）: 11.088680803534785
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Loop invariant synthesis remains a central and pivotal bottleneck in formal software verification. Recent LLM-based Neuro-Symbolic tools have achieved impressive solve rates. However, these tools rely on proprietary, often expensive cloud APIs, which constitute a hurdle for privacy-sensitive industrial deployments where the source code cannot leave the organisation or where cost is a factor. We present VerIbmc, a neuro-symbolic pipeline that pairs symbolic invariant generation with locally deployable open-weight language models with the ESBMC verification tool. Our pipeline combines a deterministic symbolic invariant synthesis phase with an iterative LLM refinement loop driven by structured verifier feedback. In addition, we provide two types of pipelines that differ in their prompting strategy: Chain-of-Thought vs. Tree-of-Thought. We conduct an extensive experimental evaluation with five open-weight models (ranging from 7B to 120B parameters) across five benchmark families comprising of 520 problems (499 after excluding 21 with unavoidable overflow). Overall, the best single configuration (GPT-OSS-120B) solves 431 of 499 problems (86.4%). Additionally, on the four benchmark suites shared with the strongest cloud-API tools, VerIbmc is competitive running only on a single local machine. The evaluation shows symbolic invariant synthesis solves 75 problems without any LLM call and yields up to +35 additional problems for the weakest model. Importantly, all inference runs entirely on a single local machine using open-weight models -- no cloud API or proprietary model is required. Overall, we demonstrate that a neuro-symbolic approach based on LLMs can be used effectively for invariant synthesis in a privacy-preserving and energy-efficient manner, without having to resort to expensive proprietary frontier models locked behind APIs.
Abstract（参考訳）: ループ不変合成は、フォーマルなソフトウェア検証において中心的で重要なボトルネックであり続けている。最近のLSMベースのNeuro-Symbolicツールは、目覚ましい解決率を達成した。しかし、これらのツールはプロプライエタリで、しばしば高価なクラウドAPIに依存しており、ソースコードが組織を離れることができないり、コストが要因となるような、プライバシに敏感な産業展開のハードルとなっている。本稿では, ESBMC検証ツールを用いて, 局所展開可能なオープンウェイト言語モデルとシンボリック不変生成を組み合わせた, ニューロシンボリックパイプラインであるVerIbmcを提案する。我々のパイプラインは、決定論的シンボリック不変合成相と、構造化された検証器フィードバックによって駆動される反復LLM精製ループを結合する。さらに、私たちは、そのプロンプト戦略が異なる2つのタイプのパイプラインを提供しています。我々は,520問題(避けられないオーバーフローを含む21を除いた499)からなる5つのベンチマークファミリーに対して,5つのオープンウェイトモデル(7Bから120Bパラメータ)で広範囲に実験を行った。全体として、最高の単一構成(GPT-OSS-120B)は499の431の問題を解決している(86.4%)。さらに、最も強力なクラウドAPIツールと共有される4つのベンチマークスイートでは、VerIbmcは、単一のローカルマシン上でのみ実行される競争力がある。この評価は、記号的不変合成がLLM呼び出しなしで75の問題を解き、最も弱いモデルに対して最大+35の問題を生じることを示している。重要なのは、すべての推論がオープンウェイトモデルを使用して単一のローカルマシン上で実行されることだ。クラウドAPIやプロプライエタリモデルを必要としない。全体として、LLMに基づくニューロシンボリックアプローチは、高価なプロプライエタリなフロンティアモデルに頼らずに、プライバシー保護とエネルギー効率の両面で不変合成に効果的に利用できることを示した。

論文の概要: Neuro-Symbolic Software Verification: Hyper-charging Local Language Models with Symbolic Reasoning at Scale

関連論文リスト