Fugu-MT 論文翻訳(概要): Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis

論文の概要: Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis

arxiv url: http://arxiv.org/abs/2510.26423v1
Date: Thu, 30 Oct 2025 12:20:25 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-31 16:05:09.803948
Title: Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis
Title（参考訳）: Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis
Authors: Dong Huang, Mingzhe Du, Jie M. Zhang, Zheng Lin, Meng Luo, Qianru Zhang, See-Kiong Ng,
Abstract要約: 非回帰テストにおけるテストオラクル生成は、ソフトウェア工学における長年の課題である。この課題に対処するための新しいマルチエージェントフレームワークであるNexusを紹介します。
参考スコア（独自算出の注目度）: 57.40527331817245
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Test oracle generation in non-regression testing is a longstanding challenge in software engineering, where the goal is to produce oracles that can accurately determine whether a function under test (FUT) behaves as intended for a given input. In this paper, we introduce Nexus, a novel multi-agent framework to address this challenge. Nexus generates test oracles by leveraging a diverse set of specialized agents that synthesize test oracles through a structured process of deliberation, validation, and iterative self-refinement. During the deliberation phase, a panel of four specialist agents, each embodying a distinct testing philosophy, collaboratively critiques and refines an initial set of test oracles. Then, in the validation phase, Nexus generates a plausible candidate implementation of the FUT and executes the proposed oracles against it in a secure sandbox. For any oracle that fails this execution-based check, Nexus activates an automated selfrefinement loop, using the specific runtime error to debug and correct the oracle before re-validation. Our extensive evaluation on seven diverse benchmarks demonstrates that Nexus consistently and substantially outperforms state-of-theart baselines. For instance, Nexus improves the test-level oracle accuracy on the LiveCodeBench from 46.30% to 57.73% for GPT-4.1-Mini. The improved accuracy also significantly enhances downstream tasks: the bug detection rate of GPT4.1-Mini generated test oracles on HumanEval increases from 90.91% to 95.45% for Nexus compared to baselines, and the success rate of automated program repair improves from 35.23% to 69.32%.
Abstract（参考訳）: 非回帰テストにおけるテストオラクル生成は、ソフトウェア工学における長年の課題であり、テスト対象の関数が与えられた入力に対して意図された振る舞いをするかどうかを正確に判断するオラクルを作成することを目的としている。本稿では,この課題に対処する新しいマルチエージェントフレームワークであるNexusを紹介する。 Nexusは、熟考、検証、反復的な自己修正という構造化プロセスを通じて、テストオークルを合成するさまざまな特殊エージェントのセットを活用することで、テストオークルを生成する。審議期間中、4人の専門エージェントからなるパネルは、それぞれ異なるテスト哲学を具現化し、共同で批判し、最初のテストオラクルのセットを洗練する。そして、検証フェーズにおいて、Nexusは、FUTのもっともらしい候補実装を生成し、セキュアなサンドボックスで、提案したオーラクルを実行する。この実行ベースのチェックに失敗するオラクルに対して、Nexusは自動的な自己リファインメントループを起動し、特定のランタイムエラーを使用して、再検証前のオラクルをデバッグし、修正する。 7つの多種多様なベンチマークに対する我々の広範な評価は、Nexusが一貫して、最先端のベースラインをはるかに上回っていることを示している。例えば、Nexusは、GPT-4.1-Miniで、LiveCodeBenchのテストレベルのオラクル精度を46.30%から57.73%に改善している。 GPT4.1-Miniが生成したHumanEvalのバグ検出率は、ベースラインに比べてNexusの90.91%から95.45%に増加し、自動プログラム修復の成功率は35.23%から69.32%に向上した。

論文の概要: Nexus: Execution-Grounded Multi-Agent Test Oracle Synthesis

関連論文リスト