Fugu-MT 論文翻訳(概要): Towards Verified Code Reasoning by LLMs

論文の概要: Towards Verified Code Reasoning by LLMs

arxiv url: http://arxiv.org/abs/2509.26546v1
Date: Tue, 30 Sep 2025 17:17:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 17:09:04.635078
Title: Towards Verified Code Reasoning by LLMs
Title（参考訳）: LLMによるコード推論の検証に向けて
Authors: Meghana Sistla, Gogul Balakrishnan, Pat Rondon, José Cambronero, Michele Tufano, Satish Chandra,
Abstract要約: 本稿では,コード推論エージェントの回答を自動的に検証する手法について述べる。本手法は, エージェントの応答の形式的表現を抽出し, その後, 形式的検証とプログラム解析ツールを用いて構成する。
参考スコア（独自算出の注目度）: 6.973151264926856
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. This prevents the agent from being useful in situations where high precision is desired: (1) helping a software engineer understand a new code base, (2) helping a software engineer during code review sessions, and (3) ensuring that the code generated by an automated code generation system meets certain requirements (e.g. fixes a bug, improves readability, implements a feature). As a result of this lack of trustworthiness, the agent's answers need to be manually verified before they can be trusted. Manually confirming responses from a code reasoning agent requires human effort and can result in slower developer productivity, which weakens the assistance benefits of the agent. In this paper, we describe a method to automatically validate the answers provided by a code reasoning agent by verifying its reasoning steps. At a very high level, the method consists of extracting a formal representation of the agent's response and, subsequently, using formal verification and program analysis tools to verify the agent's reasoning steps. We applied this approach to a benchmark set of 20 uninitialized variable errors detected by sanitizers and 20 program equivalence queries. For the uninitialized variable errors, the formal verification step was able to validate the agent's reasoning on 13/20 examples, and for the program equivalence queries, the formal verification step successfully caught 6/8 incorrect judgments made by the agent.
Abstract（参考訳）: LLMベースのエージェントは、様々なコード推論問題に取り組むことができるが、答えは常に正しいとは限らない。 1) ソフトウェアエンジニアが新しいコードベースを理解するのを助けること(2) コードレビューセッション中にソフトウェアエンジニアを助けること(3) 自動コード生成システムによって生成されたコードが特定の要件を満たすことを保証すること(例えば、バグを修正し、可読性を改善し、機能を実装すること)。この信頼性の欠如の結果として、エージェントの回答は信頼される前に手作業で検証される必要がある。コード推論エージェントからの応答を手動で確認するには、人間の努力が必要であり、結果として開発者の生産性が遅くなり、エージェントの補助効果が低下する可能性がある。本稿では,コード推論エージェントの回答を自動的に検証する手法について述べる。非常に高いレベルでは、エージェントの応答の正式な表現を抽出し、その後、正式な検証とプログラム解析ツールを使用してエージェントの推論ステップを検証する。本手法を,サニタイザによって検出された20の未初期化変数エラーと20のプログラム等価クエリのベンチマークセットに適用した。初期化されていない変数エラーに対して、正式な検証ステップは、13/20例でエージェントの推論を検証することができ、プログラムの等価クエリに対して、正式な検証ステップは、エージェントによってなされた6/8の誤判定をうまくキャッチした。

論文の概要: Towards Verified Code Reasoning by LLMs

関連論文リスト