SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
- URL: http://arxiv.org/abs/2405.15793v2
- Date: Thu, 30 May 2024 19:09:01 GMT
- Title: SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
- Authors: John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press,
- Abstract summary: SWE-agent is a system that facilitates LM agents to autonomously use computers to solve software engineering tasks.
SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs.
We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively.
- Score: 79.07755560048388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like software engineering, we posit that LM agents represent a new category of end users with their own needs and abilities, and would benefit from specially-built interfaces to the software they use. We investigate how interface design affects the performance of language model agents. As a result of this exploration, we introduce SWE-agent: a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively, far exceeding the previous state-of-the-art achieved with non-interactive LMs. Finally, we provide insight on how the design of the ACI can impact agents' behavior and performance.
Related papers
- Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents.
We propose the Internet of Agents (IoA), a novel framework that addresses these limitations.
IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z) - Agentless: Demystifying LLM-based Software Engineering Agents [12.19683999553113]
We build Agentless -- an agentless approach to automatically solve software development problems.
Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic two-phase process of localization followed by repair.
Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance (27.33%) and lowest cost ($0.34) compared with all existing open-source software agents!
arXiv Detail & Related papers (2024-07-01T17:24:45Z) - CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only [21.054681757006385]
Large Language Models (LLMs) with advanced reasoning capabilities have set the stage for agents to undertake more complex and previously unseen tasks.
We propose an agent that functions solely on the basis of screenshots for recognizing environments.
We achieve a success rate of 94.4% on 67types of MiniWoB++ problems, utilizing only 1.48demonstrations per problem type.
arXiv Detail & Related papers (2024-06-11T05:21:20Z) - From Language Models to Practical Self-Improving Computer Agents [0.8547032097715571]
We develop a methodology to create AI computer agents that can carry out diverse computer tasks and self-improve.
We prompt an LLM agent to augment itself with retrieval, internet search, web navigation, and text editor capabilities.
The agent effectively uses these various tools to solve problems including automated software development and web-based tasks.
arXiv Detail & Related papers (2024-04-18T07:50:10Z) - AgentScope: A Flexible yet Robust Multi-Agent Platform [66.64116117163755]
AgentScope is a developer-centric multi-agent platform with message exchange as its core communication mechanism.
The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment.
arXiv Detail & Related papers (2024-02-21T04:11:28Z) - MobileAgent: enhancing mobile control via human-machine interaction and
SOP integration [0.0]
Large Language Models (LLMs) are now capable of automating mobile device operations for users.
Privacy concerns related to personalized user data arise during mobile operations, requiring user confirmation.
We have designed interactive tasks between agents and humans to identify sensitive information and align with personalized user needs.
Our approach is evaluated on the new device control benchmark AitW, which encompasses 30K unique instructions across multi-step tasks.
arXiv Detail & Related papers (2024-01-04T03:44:42Z) - ProAgent: From Robotic Process Automation to Agentic Process Automation [87.0555252338361]
Large Language Models (LLMs) have emerged human-like intelligence.
This paper introduces Agentic Process Automation (APA), a groundbreaking automation paradigm using LLM-based agents for advanced automation.
We then instantiate ProAgent, an agent designed to craft from human instructions and make intricate decisions by coordinating specialized agents.
arXiv Detail & Related papers (2023-11-02T14:32:16Z) - Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models [31.509994889286183]
We introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of language models (LMs) in reasoning, acting, and planning.
A key feature of our approach is the incorporation of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism.
LATS achieves state-of-the-art pass@1 accuracy (92.7%) for programming on HumanEval with GPT-4 and demonstrates gradient-free performance (average score of 75.9) comparable to gradient-based fine-tuning for web navigation on WebShop with GPT
arXiv Detail & Related papers (2023-10-06T17:55:11Z) - Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with
Agent Team Optimization [59.39113350538332]
Large language model (LLM) agents have been shown effective on a wide range of tasks, and by ensembling multiple LLM agents, their performances could be further improved.
Existing approaches employ a fixed set of agents to interact with each other in a static architecture.
We build a framework named Dynamic LLM-Agent Network ($textbfDyLAN$) for LLM-agent collaboration on complicated tasks like reasoning and code generation.
arXiv Detail & Related papers (2023-10-03T16:05:48Z) - Agents: An Open-source Framework for Autonomous Language Agents [98.91085725608917]
We consider language agents as a promising direction towards artificial general intelligence.
We release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience.
arXiv Detail & Related papers (2023-09-14T17:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.