A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System
- URL: http://arxiv.org/abs/2510.09721v1
- Date: Fri, 10 Oct 2025 06:56:50 GMT
- Title: A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System
- Authors: Jiale Guo, Suizhi Huang, Mei Li, Dong Huang, Xingsheng Chen, Regina Zhang, Zhijiang Guo, Han Yu, Siu-Ming Yiu, Christian Jensen, Pietro Lio, Kwok-Yan Lam,
- Abstract summary: This survey presents the first holistic analysis of LLM-empowered software engineering.<n>We analyze 150+ recent papers and organize them into a comprehensive taxonomy spanning two major dimensions.<n>Our analysis reveals how the field has evolved from simple prompt engineering to complex agentic systems.
- Score: 54.933911409697714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of LLMs into software engineering has catalyzed a paradigm shift from traditional rule-based systems to sophisticated agentic systems capable of autonomous problem-solving. Despite this transformation, the field lacks a comprehensive understanding of how benchmarks and solutions interconnect, hindering systematic progress and evaluation. This survey presents the first holistic analysis of LLM-empowered software engineering, bridging the critical gap between evaluation and solution approaches. We analyze 150+ recent papers and organize them into a comprehensive taxonomy spanning two major dimensions: (1) Solutions, categorized into prompt-based, fine-tuning-based, and agent-based paradigms, and (2) Benchmarks, covering code generation, translation, repair, and other tasks. Our analysis reveals how the field has evolved from simple prompt engineering to complex agentic systems incorporating planning and decomposition, reasoning and self-refinement, memory mechanisms, and tool augmentation. We present a unified pipeline that illustrates the complete workflow from task specification to final deliverables, demonstrating how different solution paradigms address varying complexity levels across software engineering tasks. Unlike existing surveys that focus on isolated aspects, we provide full-spectrum coverage connecting 50+ benchmarks with their corresponding solution strategies, enabling researchers to identify optimal approaches for specific evaluation criteria. Furthermore, we identify critical research gaps and propose actionable future directions, including multi-agent collaboration frameworks, self-evolving code generation systems, and integration of formal verification with LLM-based methods. This survey serves as a foundational resource for researchers and practitioners seeking to understand, evaluate, and advance LLM-empowered software engineering systems.
Related papers
- Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey [59.3507264893654]
Issue resolution is a complex Software Engineering task integral to real-world development.<n> benchmarks like SWE-bench revealed this task as profoundly difficult for large language models.<n>This paper presents a systematic survey of this emerging domain.
arXiv Detail & Related papers (2026-01-15T18:55:03Z) - LLM-Based Agentic Systems for Software Engineering: Challenges and Opportunities [0.03437656066916039]
This concept paper systematically reviews the emerging paradigm of LLM-based multi-agent systems.<n>We delve into a wide range of topics such as language model selection, SE evaluation benchmarks, state-of-the-art agentic frameworks and communication protocols.
arXiv Detail & Related papers (2026-01-14T19:28:30Z) - Agentic Software Issue Resolution with Large Language Models: A Survey [9.583478737157531]
Software issue resolution aims to address real-world issues in software repositories based on natural language descriptions provided by users.<n>Large language models (LLMs) in reasoning and generative capabilities have made significant progress in automated software issue resolution.<n>Recently, LLM-based agentic systems have become mainstream for software issue resolution.
arXiv Detail & Related papers (2025-12-24T08:05:10Z) - Barbarians at the Gate: How AI is Upending Systems Research [58.95406995634148]
We argue that systems research, long focused on designing and evaluating new performance-oriented algorithms, is particularly well-suited for AI-driven solution discovery.<n>We term this approach as AI-Driven Research for Systems ( ADRS), which iteratively generates, evaluates, and refines solutions.<n>Our results highlight both the disruptive potential and the urgent need to adapt systems research practices in the age of AI.
arXiv Detail & Related papers (2025-10-07T17:49:24Z) - A Survey on Code Generation with LLM-based Agents [61.474191493322415]
Code generation agents powered by large language models (LLMs) are revolutionizing the software development paradigm.<n>LLMs are characterized by three core features.<n>This paper presents a systematic survey of the field of LLM-based code generation agents.
arXiv Detail & Related papers (2025-07-31T18:17:36Z) - Deep Research Agents: A Systematic Examination And Roadmap [109.53237992384872]
Deep Research (DR) agents are designed to tackle complex, multi-turn informational research tasks.<n>In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute DR agents.
arXiv Detail & Related papers (2025-06-22T16:52:48Z) - Towards AI Search Paradigm [42.62890561623222]
We introduce the AI Search Paradigm, a blueprint for next-generation search systems capable of emulating human information processing and decision-making.<n>The paradigm employs a modular architecture of four LLM-powered agents that dynamically adapt to the full spectrum of information needs.<n>By providing an in-depth guide to these components, this work aims to inform the development of trustworthy, adaptive, and scalable AI search systems.
arXiv Detail & Related papers (2025-06-20T17:42:13Z) - From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems [6.284317913684068]
Compound Al Systems (CAIS) is an emerging paradigm that integrates large language models (LLMs) with external components, such as retrievers, agents, tools, and orchestrators.<n>Despite growing adoption in both academia and industry, the CAIS landscape remains fragmented, lacking a unified framework for analysis, taxonomy, and evaluation.<n>This survey aims to provide researchers and practitioners with a comprehensive foundation for understanding, developing, and advancing the next generation of system-level artificial intelligence.
arXiv Detail & Related papers (2025-06-05T02:34:43Z) - A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making.<n>With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems.<n>We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z) - Designing Algorithms Empowered by Language Models: An Analytical Framework, Case Studies, and Insights [86.06371692309972]
This work presents an analytical framework for the design and analysis of large language models (LLMs)-based algorithms.<n>Our proposed framework serves as an attempt to mitigate such headaches.
arXiv Detail & Related papers (2024-07-20T07:39:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.