RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform
- URL: http://arxiv.org/abs/2603.05026v1
- Date: Thu, 05 Mar 2026 10:15:13 GMT
- Title: RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform
- Authors: Kenan Li, Rongzhi Li, Linghao Zhang, Qirui Jin, Liao Zhu, Xiaosong Huang, Geng Zhang, Yikai Zhang, Shilin He, Chengxing Xie, Xin Zhang, Zijian Jin, Bowen Li, Chaoyun Zhang, Yu Kang, Yufan Huang, Elsie Nallipogu, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang,
- Abstract summary: We introduce RepoLaunch, the first agent capable of automatically resolving dependencies, compiling source code, and extracting test results for repositories across arbitrary programming languages and operating systems.<n>RepoLaunch automates the remaining steps, enabling scalable benchmarking and training of coding agents and LLMs.
- Score: 49.43594274832262
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building software repositories typically requires significant manual effort. Recent advances in large language model (LLM) agents have accelerated automation in software engineering (SWE). We introduce RepoLaunch, the first agent capable of automatically resolving dependencies, compiling source code, and extracting test results for repositories across arbitrary programming languages and operating systems. To demonstrate its utility, we further propose a fully automated pipeline for SWE dataset creation, where task design is the only human intervention. RepoLaunch automates the remaining steps, enabling scalable benchmarking and training of coding agents and LLMs. Notably, several works on agentic benchmarking and training have recently adopted RepoLaunch for automated task generation.
Related papers
- TimeMachine-bench: A Benchmark for Evaluating Model Capabilities in Repository-Level Migration Tasks [12.573674060643787]
TimeMachine-bench is a benchmark designed to evaluate software migration in real-world Python projects.<n>Our benchmark consists of GitHub repositories whose tests begin to fail in response to dependency updates.
arXiv Detail & Related papers (2026-01-30T05:42:45Z) - Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects [11.418182511485032]
Large Language Model (LLM) based agents have been proposed for performing repository level' tasks.<n>We argue that one important task is missing, which is to fulfil project level dependency by installing other repositories.<n>We introduce a benchmark of repository installation tasks curated from 40 open source Python projects, which includes a ground truth installation process for each target repository.<n>Experiments reveal that 55% of the studied repositories can be automatically installed by our agent at least one out of ten times.
arXiv Detail & Related papers (2024-12-09T08:37:06Z) - AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.<n>Recent works have started exploiting large language models (LLM) to lessen such burden.<n>This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z) - HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale [16.716880943539376]
Large Language Models (LLMs) have revolutionized software engineering (SE)<n>Despite recent advancements, these systems are typically designed for specific SE functions.<n>We introduce HyperAgent, an innovative generalist multi-agent system designed to tackle a wide range of SE tasks.
arXiv Detail & Related papers (2024-09-09T19:35:34Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - Agentless: Demystifying LLM-based Software Engineering Agents [12.19683999553113]
We build Agentless -- an agentless approach to automatically solve software development problems.
Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic three-phase process of localization, repair, and patch validation.
Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance and low cost.
arXiv Detail & Related papers (2024-07-01T17:24:45Z) - Automated User Story Generation with Test Case Specification Using Large Language Model [0.0]
We developed a tool "GeneUS" to automatically create user stories from requirements documents.
The output is provided in format leaving the possibilities open for downstream integration to the popular project management tools.
arXiv Detail & Related papers (2024-04-02T01:45:57Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - ProAgent: From Robotic Process Automation to Agentic Process Automation [87.0555252338361]
Large Language Models (LLMs) have emerged human-like intelligence.
This paper introduces Agentic Process Automation (APA), a groundbreaking automation paradigm using LLM-based agents for advanced automation.
We then instantiate ProAgent, an agent designed to craft from human instructions and make intricate decisions by coordinating specialized agents.
arXiv Detail & Related papers (2023-11-02T14:32:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.