Fugu-MT 論文翻訳(概要): A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System

論文の概要: A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System

arxiv url: http://arxiv.org/abs/2510.09721v2
Date: Thu, 16 Oct 2025 08:15:02 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-17 14:17:28.082329
Title: A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System
Title（参考訳）: LLM-Empowered Agentic System のソフトウェア工学におけるベンチマークとソリューションに関する総合調査
Authors: Jiale Guo, Suizhi Huang, Mei Li, Dong Huang, Xingsheng Chen, Regina Zhang, Zhijiang Guo, Han Yu, Siu-Ming Yiu, Christian Jensen, Pietro Lio, Kwok-Yan Lam,
Abstract要約: この調査は、Large Language Modelsを使ったソフトウェアエンジニアリングに関する、最初の総合的な分析を提供する。本稿では,150以上の最近の論文をレビューし,(1)素早い,微調整,エージェントベースのパラダイムに分類した解法,(2)コード生成,翻訳,修復などのタスクを含むベンチマークという2つの重要な側面に沿った分類法を提案する。
参考スコア（独自算出の注目度）: 54.933911409697714
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The integration of Large Language Models (LLMs) into software engineering has driven a transition from traditional rule-based systems to autonomous agentic systems capable of solving complex problems. However, systematic progress is hindered by a lack of comprehensive understanding of how benchmarks and solutions interconnect. This survey addresses this gap by providing the first holistic analysis of LLM-powered software engineering, offering insights into evaluation methodologies and solution paradigms. We review over 150 recent papers and propose a taxonomy along two key dimensions: (1) Solutions, categorized into prompt-based, fine-tuning-based, and agent-based paradigms, and (2) Benchmarks, including tasks such as code generation, translation, and repair. Our analysis highlights the evolution from simple prompt engineering to sophisticated agentic systems incorporating capabilities like planning, reasoning, memory mechanisms, and tool augmentation. To contextualize this progress, we present a unified pipeline illustrating the workflow from task specification to deliverables, detailing how different solution paradigms address various complexity levels. Unlike prior surveys that focus narrowly on specific aspects, this work connects 50+ benchmarks to their corresponding solution strategies, enabling researchers to identify optimal approaches for diverse evaluation criteria. We also identify critical research gaps and propose future directions, including multi-agent collaboration, self-evolving systems, and formal verification integration. This survey serves as a foundational guide for advancing LLM-driven software engineering. We maintain a GitHub repository that continuously updates the reviewed and related papers at https://github.com/lisaGuojl/LLM-Agent-SE-Survey.
Abstract（参考訳）: ソフトウェア工学へのLLM(Large Language Models)の統合は、従来のルールベースのシステムから、複雑な問題を解決することのできる自律的なエージェントシステムへの移行を促した。しかし、体系的な進歩は、ベンチマークとソリューションの相互接続方法に関する包括的な理解の欠如によって妨げられている。この調査は、LCMを利用したソフトウェアエンジニアリングの総合的な分析を初めて提供し、評価方法論とソリューションパラダイムに関する洞察を提供することによって、このギャップに対処する。本稿では,150以上の最近の論文をレビューし,(1)素早い,微調整,エージェントベースのパラダイムに分類した解法,(2)コード生成,翻訳,修復などのタスクを含むベンチマークという2つの重要な側面に沿った分類法を提案する。我々の分析は、単純なプロンプトエンジニアリングから、計画、推論、メモリ機構、ツール拡張といった機能を組み込んだ高度なエージェントシステムへの進化を強調している。この進捗を文脈的に把握するために、タスク仕様から成果物まで、さまざまなソリューションパラダイムがさまざまな複雑性レベルにどのように対処するかを詳述する統合パイプラインを紹介します。特定の側面に焦点を絞った以前の調査とは異なり、この研究は50以上のベンチマークと対応するソリューション戦略を結びつけ、研究者が様々な評価基準に最適なアプローチを特定できるようにする。また、重要な研究ギャップを特定し、マルチエージェントコラボレーション、自己進化システム、形式的検証統合など、今後の方向性を提案する。この調査は、LLM駆動のソフトウェアエンジニアリングを進めるための基礎的なガイドとして役立ちます。私たちは、レビューおよび関連する論文をhttps://github.com/lisaGuojl/LLM-Agent-SE-Survey.comで継続的に更新するGitHubリポジトリを維持しています。

論文の概要: A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System

関連論文リスト