Related papers: Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects

Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects

URL: http://arxiv.org/abs/2412.06294v1
Date: Mon, 09 Dec 2024 08:37:06 GMT
Title: Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects
Authors: Louis Milliken, Sungmin Kang, Shin Yoo,
Abstract summary: Large Language Model (LLM) based agents have been proposed for performing repository level' tasks.<n>We argue that one important task is missing, which is to fulfil project level dependency by installing other repositories.<n>We introduce a benchmark of repository installation tasks curated from 40 open source Python projects, which includes a ground truth installation process for each target repository.<n>Experiments reveal that 55% of the studied repositories can be automatically installed by our agent at least one out of ten times.
Score: 11.418182511485032
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Many works have recently proposed the use of Large Language Model (LLM) based agents for performing `repository level' tasks, loosely defined as a set of tasks whose scopes are greater than a single file. This has led to speculation that the orchestration of these repository-level tasks could lead to software engineering agents capable of performing almost independently of human intervention. However, of the suite of tasks that would need to be performed by this autonomous software engineering agent, we argue that one important task is missing, which is to fulfil project level dependency by installing other repositories. To investigate the feasibility of this repository level installation task, we introduce a benchmark of of repository installation tasks curated from 40 open source Python projects, which includes a ground truth installation process for each target repository. Further, we propose Installamatic, an agent which aims to perform and verify the installation of a given repository by searching for relevant instructions from documentation in the repository. Empirical experiments reveal that that 55% of the studied repositories can be automatically installed by our agent at least one out of ten times. Through further analysis, we identify the common causes for our agent's inability to install a repository, discuss the challenges faced in the design and implementation of such an agent and consider the implications that such an agent could have for developers.

Related papers

EnvBench: A Benchmark for Automated Environment Setup [76.02998475135824]
Large Language Models have enabled researchers to focus on practical repository-level tasks in software engineering domain. Existing studies on environment setup introduce innovative agentic strategies, but their evaluation is often based on small datasets. To address this gap, we introduce a comprehensive environment setup benchmark EnvBench.
arXiv Detail & Related papers (2025-03-18T17:19:12Z)
Automated Benchmark Generation for Repository-Level Coding Tasks [7.305342793164905]
SetUpAgent is a fully automated system capable of historically accurate dependency setup, test execution, and result parsing. We generate two new datasets: (i) SWEE-Bench an extended version of SWE-Bench encompassing hundreds of repositories, and (ii) SWA-Bench a benchmark focusing on applications rather than libraries.
arXiv Detail & Related papers (2025-03-10T17:42:49Z)
LLM Agents Making Agent Tools [2.5529148902034637]
Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks. We propose ToolMaker, a novel agentic framework that autonomously transforms papers with code into LLM-compatible tools. Given a short task description and a repository URL, ToolMaker autonomously installs required dependencies and generates code to perform the task.
arXiv Detail & Related papers (2025-02-17T11:44:11Z)
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale [12.173834895070827]
Large Language Models (LLMs) have revolutionized software engineering (SE) Despite recent advancements, these systems are typically designed for specific SE functions. We introduce HyperAgent, an innovative generalist multi-agent system designed to tackle a wide range of SE tasks.
arXiv Detail & Related papers (2024-09-09T19:35:34Z)
Agentless: Demystifying LLM-based Software Engineering Agents [12.19683999553113]
We build Agentless -- an agentless approach to automatically solve software development problems. Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic three-phase process of localization, repair, and patch validation. Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance and low cost.
arXiv Detail & Related papers (2024-07-01T17:24:45Z)
How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE) We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [79.07755560048388]
SWE-agent is a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively.
arXiv Detail & Related papers (2024-05-06T17:41:33Z)
AgentScope: A Flexible yet Robust Multi-Agent Platform [66.64116117163755]
AgentScope is a developer-centric multi-agent platform with message exchange as its core communication mechanism. The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment.
arXiv Detail & Related papers (2024-02-21T04:11:28Z)
GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension [81.44231422624055]
A growing area of research focuses on Large Language Models (LLMs) equipped with external tools capable of performing diverse tasks. In this paper, we introduce GitAgent, an agent capable of achieving the autonomous tool extension from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
A Zero-Shot Language Agent for Computer Control with Structured Reflection [19.526676887048662]
Large language models (LLMs) have shown increasing capacity at planning and executing a high-level goal in a live computer environment. To perform a task, recent works often require a model to learn from trace examples of the task via either supervised learning or few/many-shot prompting. We approach this problem with a zero-shot agent that requires no given expert traces.
arXiv Detail & Related papers (2023-10-12T21:53:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.