CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems
for Real-World Repo-level Coding Challenges
- URL: http://arxiv.org/abs/2401.07339v1
- Date: Sun, 14 Jan 2024 18:12:03 GMT
- Title: CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems
for Real-World Repo-level Coding Challenges
- Authors: Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, Zhi Jin
- Abstract summary: Large Language Models (LLMs) have shown promise in automated code generation but typically excel only in simpler tasks.
Our research pivots towards evaluating LLMs in a more realistic setting -- real-world repo-level code generation.
We present CodeAgent, a novel LLM-based agent framework that employs external tools for effective repo-level code generation.
- Score: 44.028079593225584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown promise in automated code generation
but typically excel only in simpler tasks such as generating standalone code
units. Real-world software development, however, often involves complex code
repositories (named repo) with complex dependencies and extensive
documentation. To fill this gap, our research pivots towards evaluating LLMs in
a more realistic setting -- real-world repo-level code generation. We introduce
CodeAgentBench, a manually curated benchmark for repo-level code generation.
This benchmark comprises five high-quality Python projects, encompassing a
total of 101 samples. We assess nine leading LLMs on repo-level tasks and
observe a decline in their performance. To tackle this, we present CodeAgent, a
novel LLM-based agent framework that employs external tools for effective
repo-level code generation. CodeAgent integrates five programming tools,
enabling interaction with software artifacts for information retrieval, code
symbol navigation, and code testing. We implement four agent strategies to
optimize these tools' usage. Our experiments on CodeAgentBench show that
CodeAgent enhances LLM performance significantly, with improvements ranging
from 18.1\% to 250\%. Further tests on the HumanEval benchmark confirm
CodeAgent's adaptability and efficacy across various code generation tasks.
Notably, CodeAgent outperforms commercial products like Github Copilot,
showcasing superior accuracy and efficiency. These results demonstrate
CodeAgent's robust capabilities in code generation, highlighting its potential
for real-world repo-level coding challenges.
Related papers
- CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents [52.83132876539399]
Crab is the first benchmark framework designed to support cross-environment tasks.
Our framework supports multiple devices and can be easily extended to any environment with a Python interface.
The experimental results demonstrate that the single agent with GPT-4o achieves the best completion ratio of 35.
arXiv Detail & Related papers (2024-07-01T17:55:04Z) - Code Agents are State of the Art Software Testers [10.730852617039451]
We investigate the capability of LLM-based Code Agents for formalizing user issues into test cases.
We propose a novel benchmark based on popular GitHub repositories, containing real-world issues, ground-truth patches, and golden tests.
We find that LLMs generally perform surprisingly well at generating relevant test cases with Code Agents designed for code repair.
arXiv Detail & Related papers (2024-06-18T14:54:37Z) - RepoAgent: An LLM-Powered Open-Source Framework for Repository-level
Code Documentation Generation [79.83270415843857]
We introduce RepoAgent, a large language model powered open-source framework aimed at proactively generating, maintaining, and updating code documentation.
We have validated the effectiveness of our approach, showing that RepoAgent excels in generating high-quality repository-level documentation.
arXiv Detail & Related papers (2024-02-26T15:39:52Z) - CodeAgent: Collaborative Agents for Software Engineering [11.476666454138021]
Code review aims at ensuring the overall quality and reliability of software.
Existing automated methods rely on single input-output generative models.
This work introduces CodeAgent, a novel multi-agent Large Language Model (LLM) system for code review automation.
arXiv Detail & Related papers (2024-02-03T14:43:14Z) - CodePori: Large Scale Model for Autonomous Software Development by Using
Multi-Agents [3.8066447473175304]
Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs) are reshaping the field of Software Engineering (SE)
This paper introduces CodePori, a novel model designed to automate code generation for extensive and complex software projects based on natural language prompts.
We show in the paper that CodePori is able to generate running code for large-scale projects, completing the entire software development process in minutes rather than hours, and at a cost of a few dollars.
arXiv Detail & Related papers (2024-02-02T13:42:50Z) - Executable Code Actions Elicit Better LLM Agents [76.95566120678787]
This work proposes to use Python code to consolidate Large Language Model (LLM) agents' actions into a unified action space (CodeAct)
integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions.
The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language.
arXiv Detail & Related papers (2024-02-01T21:38:58Z) - GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension [81.44231422624055]
A growing area of research focuses on Large Language Models (LLMs) equipped with external tools capable of performing diverse tasks.
In this paper, we introduce GitAgent, an agent capable of achieving the autonomous tool extension from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z) - AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation [11.155351560550853]
This paper introduces Multi-Agent Assistant Code Generation (AgentCoder)
AgentCoder is a novel solution comprising a multi-agent framework with specialized agents: the programmer agent, the test designer agent, and the test executor agent.
Our experiments on 9 code generation models and 12 enhancement approaches showcase AgentCoder's superior performance over existing code generation models.
arXiv Detail & Related papers (2023-12-20T13:22:41Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.