Fugu-MT 論文翻訳(概要): Testing and Enhancing Multi-Agent Systems for Robust Code Generation

論文の概要: Testing and Enhancing Multi-Agent Systems for Robust Code Generation

arxiv url: http://arxiv.org/abs/2510.10460v1
Date: Sun, 12 Oct 2025 05:45:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:29.950763
Title: Testing and Enhancing Multi-Agent Systems for Robust Code Generation
Title（参考訳）: ロバストコード生成のためのマルチエージェントシステムのテストと強化
Authors: Zongyi Lyu, Songqiang Chen, Zhenlan Ji, Liwen Wang, Shuai Wang, Daoyuan Wu, Wenxuan Wang, Shing-Chi Cheung,
Abstract要約: 自動コード生成のための有望なパラダイムとしてマルチエージェントシステム(MAS)が登場した。繁栄と採用にもかかわらず、その頑丈さはいまだに過小評価されている。本稿ではファジィテストによるコード生成のためのMASのロバスト性を検証した最初の総合的研究について述べる。
参考スコア（独自算出の注目度）: 21.38351747327572
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-agent systems (MASs) have emerged as a promising paradigm for automated code generation, demonstrating impressive performance on established benchmarks by decomposing complex coding tasks across specialized agents with different roles. Despite their prosperous development and adoption, their robustness remains pressingly under-explored, raising critical concerns for real-world deployment. This paper presents the first comprehensive study examining the robustness of MASs for code generation through a fuzzing-based testing approach. By designing a fuzzing pipeline incorporating semantic-preserving mutation operators and a novel fitness function, we assess mainstream MASs across multiple datasets and LLMs. Our findings reveal substantial robustness flaws of various popular MASs: they fail to solve 7.9%-83.3% of problems they initially resolved successfully after applying the semantic-preserving mutations. Through comprehensive failure analysis, we identify a common yet largely overlooked cause of the robustness issue: miscommunications between planning and coding agents, where plans lack sufficient detail and coding agents misinterpret intricate logic, aligning with the challenges inherent in a multi-stage information transformation process. Accordingly, we also propose a repairing method that encompasses multi-prompt generation and introduces a new monitor agent to address this issue. Evaluation shows that our repairing method effectively enhances the robustness of MASs by solving 40.0%-88.9% of identified failures. Our work uncovers critical robustness flaws in MASs and provides effective mitigation strategies, contributing essential insights for developing more reliable MASs for code generation.
Abstract（参考訳）: マルチエージェントシステム(MAS)は、自動化されたコード生成のための有望なパラダイムとして登場し、異なる役割を持つ特殊なエージェント間で複雑なコーディングタスクを分解することで、確立されたベンチマークにおける印象的なパフォーマンスを誇示している。開発と採用が順調に進んでいるにもかかわらず、その堅牢性はいまだに過小評価され続けており、現実のデプロイメントに対する重要な懸念が持ち上がっている。本稿ではファジィテストによるコード生成のためのMASのロバスト性を検証した最初の総合的研究について述べる。意味保存型突然変異演算子と新しい適合関数を組み込んだファジィパイプラインを設計することにより、複数のデータセットとLLMにわたるメインストリームMASを評価する。本研究は, 種々のMASに有意な堅牢性欠陥を呈し, 意味保存突然変異を適用した後に最初に解決した問題の7.9%-83.3%を解決できなかった。計画とコーディングエージェントの間には十分な詳細が不足しており、コーディングエージェントは複雑な論理を誤って解釈し、多段階の情報変換プロセスに固有の課題と整合する。そこで本研究では,マルチプロンプト生成を含む修復手法を提案し,この問題に対処する新しいモニタエージェントを提案する。本手法は, 40.0%-88.9%の故障を解消することによりMASの堅牢性を効果的に向上することを示す。私たちの研究は、MASの重大な堅牢性欠陥を明らかにし、効果的な緩和戦略を提供し、コード生成のためのより信頼性の高いMASの開発に不可欠な洞察を与えます。

論文の概要: Testing and Enhancing Multi-Agent Systems for Robust Code Generation

関連論文リスト