An LLM-based Agent for Reliable Docker Environment Configuration
- URL: http://arxiv.org/abs/2502.13681v2
- Date: Thu, 06 Mar 2025 07:17:09 GMT
- Title: An LLM-based Agent for Reliable Docker Environment Configuration
- Authors: Ruida Hu, Chao Peng, Xinchen Wang, Cuiyun Gao,
- Abstract summary: Repo2Run is an agent designed to fully automate environment configuration and generate executable Dockerfiles for arbitrary Python repositories.<n>We address two major challenges: (1) enabling the LLM agent to configure environments within isolated Docker containers, and (2) ensuring the successful configuration process is recorded and accurately transferred to a Dockerfile without error.<n>We evaluate Repo2Runon our proposed benchmark of 420 recent Python repositories with unit tests, where it achieves an 86.4% success rate, outperforming the best baseline by 63.9%.
- Score: 9.436480907117415
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Environment configuration is a critical yet time-consuming step in software development, especially when dealing with unfamiliar code repositories. While Large Language Models (LLMs) demonstrate the potential to accomplish software engineering tasks, existing methods for environment configuration often rely on manual efforts or fragile scripts, leading to inefficiencies and unreliable outcomes. We introduce Repo2Run, the first LLM-based agent designed to fully automate environment configuration and generate executable Dockerfiles for arbitrary Python repositories. We address two major challenges: (1) enabling the LLM agent to configure environments within isolated Docker containers, and (2) ensuring the successful configuration process is recorded and accurately transferred to a Dockerfile without error. To achieve this, we propose atomic configuration synthesis, featuring a dual-environment architecture (internal and external environment) with a rollback mechanism to prevent environment "pollution" from failed commands, guaranteeing atomic execution (execute fully or not at all) and a Dockerfile generator to transfer successful configuration steps into runnable Dockerfiles. We evaluate Repo2Run~on our proposed benchmark of 420 recent Python repositories with unit tests, where it achieves an 86.0% success rate, outperforming the best baseline by 63.9%. Repo2Run is available at https://github.com/bytedance/Repo2Run.
Related papers
- Doctor: Optimizing Container Rebuild Efficiency by Instruction Re-Orchestration [11.027705516378875]
We present Doctor, a method for improving Dockerfile build efficiency through instruction re-ordering.
We developed a dependency taxonomy based on Dockerfile syntax and a historical modification analysis to prioritize frequently modified instructions.
Experiments show Doctor improves 92.75% of Dockerfiles, reducing rebuild time by an average of 26.5%, with 12.82% of files achieving over a 50% reduction.
arXiv Detail & Related papers (2025-04-02T13:53:35Z) - EnvBench: A Benchmark for Automated Environment Setup [76.02998475135824]
Large Language Models have enabled researchers to focus on practical repository-level tasks in software engineering domain.
Existing studies on environment setup introduce innovative agentic strategies, but their evaluation is often based on small datasets.
To address this gap, we introduce a comprehensive environment setup benchmark EnvBench.
arXiv Detail & Related papers (2025-03-18T17:19:12Z) - Refactoring for Dockerfile Quality: A Dive into Developer Practices and Automation Potential [0.0]
This paper explores the utility and practicality of automating Dockerfile using 600files from 358 open-source projects.<n>Our approach leads to an average reduction of 32% in image size and a 6% decrease in build duration, with improvements in understandability and maintainability observed in 77% and 91% of cases.
arXiv Detail & Related papers (2025-01-23T23:10:47Z) - ExecRepoBench: Multi-level Executable Code Completion Evaluation [45.963424627710765]
We introduce a novel framework for enhancing code completion in software development through the creation of a repository-level benchmark ExecRepoBench.<n>We present a multi-level grammar-based completion methodology conditioned on the abstract syntax tree to mask code fragments at various logical units.<n>Then, we fine-tune the open-source LLM with 7B parameters on Repo-Instruct to produce a strong code completion baseline model Qwen2.5-Coder-Instruct-C.
arXiv Detail & Related papers (2024-12-16T17:14:35Z) - Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects [11.418182511485032]
Large Language Model (LLM) based agents have been proposed for performing repository level' tasks.<n>We argue that one important task is missing, which is to fulfil project level dependency by installing other repositories.<n>We introduce a benchmark of repository installation tasks curated from 40 open source Python projects, which includes a ground truth installation process for each target repository.<n>Experiments reveal that 55% of the studied repositories can be automatically installed by our agent at least one out of ten times.
arXiv Detail & Related papers (2024-12-09T08:37:06Z) - CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents [49.68117560675367]
Crab is the first benchmark framework designed to support cross-environment tasks.
Our framework supports multiple devices and can be easily extended to any environment with a Python interface.
The experimental results demonstrate that the single agent with GPT-4o achieves the best completion ratio of 38.01%.
arXiv Detail & Related papers (2024-07-01T17:55:04Z) - Arbitrarily Scalable Environment Generators via Neural Cellular Automata [55.150593161240444]
We show that NCA environment generators maintain consistent, regularized patterns regardless of environment size.
Our method scales a single-agent reinforcement learning policy to arbitrarily large environments with similar patterns.
arXiv Detail & Related papers (2023-10-28T07:30:09Z) - L2MAC: Large Language Model Automatic Computer for Extensive Code Generation [52.81694565226513]
Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture.
This paper presents L2MAC, the first practical LLM-based general-purpose stored-program automatic computer (von Neumann architecture) framework, for long and consistent output generation.
arXiv Detail & Related papers (2023-10-02T16:55:19Z) - EnvPool: A Highly Parallel Reinforcement Learning Environment Execution
Engine [69.47822647770542]
parallel environment execution is often the slowest part of the whole system but receives little attention.
With a curated design for paralleling RL environments, we have improved the RL environment simulation speed across different hardware setups.
On a high-end machine, EnvPool achieves 1 million frames per second for the environment execution on Atari environments and 3 million frames per second on MuJoCo environments.
arXiv Detail & Related papers (2022-06-21T17:36:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.