Related papers: Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration

Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration

URL: http://arxiv.org/abs/2406.01422v2
Date: Wed, 26 Mar 2025 03:26:09 GMT
Title: Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration
Authors: Yingwei Ma, Qingping Yang, Rongyu Cao, Binhua Li, Fei Huang, Yongbin Li,
Abstract summary: This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution.<n>Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy.<n>In production deployment and evaluation at Alibaba Cloud, LingmaAgent automatically resolved 16.9% of in-house issues faced by development engineers, and solved 43.3% of problems after manual intervention.
Score: 64.19431011897515
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution. Deployed in TONGYI Lingma, an IDE-based coding assistant developed by Alibaba Cloud, LingmaAgent addresses the limitations of existing LLM-based agents that primarily focus on local code information. Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy enabling agents to explore and understand entire repositories. We guide agents to summarize, analyze, and plan using repository-level knowledge, allowing them to dynamically acquire information and generate patches for real-world GitHub issues. In extensive experiments, LingmaAgent demonstrated significant improvements, achieving an 18.5\% relative improvement on the SWE-bench Lite benchmark compared to SWE-agent. In production deployment and evaluation at Alibaba Cloud, LingmaAgent automatically resolved 16.9\% of in-house issues faced by development engineers, and solved 43.3\% of problems after manual intervention. Additionally, we have open-sourced a Python prototype of LingmaAgent for reference by other industrial developers https://github.com/RepoUnderstander/RepoUnderstander. In fact, LingmaAgent has been used as a developed reference by many subsequently agents.

Related papers

Iterative Trajectory Exploration for Multimodal Agents [69.32855772335624]
We propose an online self-exploration method for multimodal agents, namely SPORT. SPORT operates through four iterative components: task synthesis, step sampling, step verification, and preference tuning. Evaluation in the GTA and GAIA benchmarks show that the SPORT Agent achieves 6.41% and 3.64% improvements.
arXiv Detail & Related papers (2025-04-30T12:01:27Z)
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code [7.156224931977546]
We introduce RefactorBench, a benchmark consisting of 100 large handcrafted multi-file tasks in popular open-source repositories. Baselines reveal that current LM agents struggle with simple compositional tasks, solving only 22% of tasks with base instructions. By adapting a baseline agent to condition on representations of state, we achieve a 43.9% improvement in solving RefactorBench tasks.
arXiv Detail & Related papers (2025-03-10T20:23:24Z)
Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios [13.949319911378826]
This study evaluated 4,892 patches from 10 top-ranked agents on 500 real-world GitHub issues. No single agent dominated, with 170 issues unresolved, indicating room for improvement. Most agents maintained code reliability and security, avoiding new bugs or vulnerabilities. Some agents increased code complexity, many reduced code duplication and minimized code smells.
arXiv Detail & Related papers (2024-10-16T11:33:57Z)
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions. RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z)
Agentless: Demystifying LLM-based Software Engineering Agents [12.19683999553113]
We build Agentless -- an agentless approach to automatically solve software development problems. Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic three-phase process of localization, repair, and patch validation. Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance and low cost.
arXiv Detail & Related papers (2024-07-01T17:24:45Z)
On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present textbfmethodnamews, a novel benchmark designed to evaluate repository-level code generation. We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z)
VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft [46.19145184507293]
We introduce a Directed Acyclic Graph Multi-Agent Framework VillagerAgent to resolve complex inter-agent dependencies. Our empirical evaluation on VillagerBench demonstrates that VillagerAgent outperforms the existing AgentVerse model.
arXiv Detail & Related papers (2024-06-09T10:21:47Z)
On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing [82.96523584351314]
We decouple the task of context retrieval from the other components of the repository-level code editing pipelines. We conclude that while the reasoning helps to improve the precision of the gathered context, it still lacks the ability to identify its sufficiency.
arXiv Detail & Related papers (2024-06-06T19:44:17Z)
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents. We take the first step towards building generally-capable LLM-based agents with self-evolution ability. We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z)
DepsRAG: Towards Agentic Reasoning and Planning for Software Dependency Management [2.9860252315941618]
DepsRAG is a multi-agent framework designed to assist developers in reasoning about software dependencies. Developers can interact with DepsRAG through a conversational interface, posing queries about the dependencies. We evaluated DepsRAG using GPT-4-Turbo and Llama-3 on three multi-step reasoning tasks, observing a threefold increase in accuracy with the integration of the Critic-Agent mechanism.
arXiv Detail & Related papers (2024-05-30T20:05:44Z)
Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository [4.767858874370881]
We introduce RepoClassBench, a benchmark designed to rigorously evaluate LLMs in generating class-level code within real-world repositories. RepoClassBench includes "Natural Language to Class generation" tasks across Java, Python & C# from a selection of repositories. We introduce Retrieve-Repotools-Reflect (RRR), a novel approach that equips LLMs with static analysis tools to iteratively navigate & reason about repository-level context.
arXiv Detail & Related papers (2024-04-22T03:52:54Z)
RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation [79.83270415843857]
We introduce RepoAgent, a large language model powered open-source framework aimed at proactively generating, maintaining, and updating code documentation. We have validated the effectiveness of our approach, showing that RepoAgent excels in generating high-quality repository-level documentation.
arXiv Detail & Related papers (2024-02-26T15:39:52Z)
Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques. We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z)
GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension [81.44231422624055]
A growing area of research focuses on Large Language Models (LLMs) equipped with external tools capable of performing diverse tasks. In this paper, we introduce GitAgent, an agent capable of achieving the autonomous tool extension from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z)
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process. It incorporates a similarity-based retriever and a pre-trained code language model. It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z)
Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation [6.09506921406322]
We propose a novel architecture for multi-agent reinforcement learning (MARL) which uses local information intelligently to compute paths for all the agents in a decentralized manner. InforMARL aggregates information about the local neighborhood of agents for both the actor and the critic using a graph neural network and can be used in conjunction with any standard MARL algorithm.
arXiv Detail & Related papers (2022-11-03T20:02:45Z)
Repo2Vec: A Comprehensive Embedding Approach for Determining Repository Similarity [2.095199622772379]
Repo2Vec is a comprehensive embedding approach to represent a repository as a distributed vector. We evaluate our method with two real datasets from GitHub for a combined 1013 repositories.
arXiv Detail & Related papers (2021-07-11T18:57:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.