Related papers: EvoConfig: Self-Evolving Multi-Agent Systems for Efficient Autonomous Environment Configuration

EvoConfig: Self-Evolving Multi-Agent Systems for Efficient Autonomous Environment Configuration

URL: http://arxiv.org/abs/2601.16489v1
Date: Fri, 23 Jan 2026 06:33:01 GMT
Title: EvoConfig: Self-Evolving Multi-Agent Systems for Efficient Autonomous Environment Configuration
Authors: Xinshuai Guo, Jiayi Kuang, Linyue Pan, Yinghui Li, Yangning Li, Hai-Tao Zheng, Ying Shen, Di Yin, Xing Sun,
Abstract summary: EvoConfig is an efficient environment configuration framework that optimize multi-agent collaboration to build correct runtime environments.<n>It features an expert diagnosis module for fine-grained post-execution analysis, and a self-evolving mechanism that lets expert agents self-feedback and adjust dynamically error-fixing priorities.<n>EvoConfig matches the previous state-of-the-art Repo2Run on Repo2Run's 420 repositories, while delivering clear gains on harder cases.
Score: 44.95469898974659
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A reliable executable environment is the foundation for ensuring that large language models solve software engineering tasks. Due to the complex and tedious construction process, large-scale configuration is relatively inefficient. However, most methods always overlook fine-grained analysis of the actions performed by the agent, making it difficult to handle complex errors and resulting in configuration failures. To address this bottleneck, we propose EvoConfig, an efficient environment configuration framework that optimizes multi-agent collaboration to build correct runtime environments. EvoConfig features an expert diagnosis module for fine-grained post-execution analysis, and a self-evolving mechanism that lets expert agents self-feedback and dynamically adjust error-fixing priorities in real time. Empirically, EvoConfig matches the previous state-of-the-art Repo2Run on Repo2Run's 420 repositories, while delivering clear gains on harder cases: on the more challenging Envbench, EvoConfig achieves a 78.1% success rate, outperforming Repo2Run by 7.1%. Beyond end-to-end success, EvoConfig also demonstrates stronger debugging competence, achieving higher accuracy in error identification and producing more effective repair recommendations than existing methods.

Related papers

HerAgent: Rethinking the Automated Environment Deployment via Hierarchical Test Pyramid [15.944450159856602]
We argue that environment setup success should be evaluated through executable evidence rather than a single binary signal.<n>We propose HerAgent, an automated environment setup approach that incrementally constructs executable environments.
arXiv Detail & Related papers (2026-02-08T08:57:05Z)
Evolutionary Generation of Multi-Agent Systems [49.47969796873096]
Large language model (LLM)-based multi-agent systems (MAS) show strong promise for complex reasoning, planning, and tool-augmented tasks.<n>EvoMAS formulates MAS generation as structured configuration generation.<n>EvoMAS consistently improves task performance over both human-designed MAS and prior automatic MAS generation methods.
arXiv Detail & Related papers (2026-02-06T09:01:35Z)
Towards Efficient Agents: A Co-Design of Inference Architecture and System [66.59916327634639]
This paper presents AgentInfer, a unified framework for end-to-end agent acceleration.<n>We decompose the problem into four synergistic components: AgentCollab, AgentSched, AgentSAM, and AgentCompress.<n>Experiments on the BrowseComp-zh and DeepDiver benchmarks demonstrate that through the synergistic collaboration of these methods, AgentInfer reduces ineffective token consumption by over 50%.
arXiv Detail & Related papers (2025-12-20T12:06:13Z)
SCOPE: Prompt Evolution for Enhancing Agent Effectiveness [53.75986399936395]
Large Language Model (LLM) agents are increasingly deployed in environments that generate massive, dynamic contexts.<n>While agents have access to this context, their static prompts lack the mechanisms to manage it effectively.<n>We introduce textbfSCOPE (Self-evolving Context Optimization via Prompt Evolution)<n>We propose a Dual-Stream mechanism that balances tactical specificity (resolving immediate errors) with strategic generality (evolving long-term principles)
arXiv Detail & Related papers (2025-12-17T12:25:05Z)
Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents [71.85020581835042]
Large language model-based agents show promise for software engineering, but environment configuration remains a bottleneck.<n>Existing benchmarks assess only end-to-end build/test success, obscuring where and why agents succeed or fail.<n>We introduce Enconda-bench, which provides process-level trajectory assessment of fine-grained agent capabilities during environment setup-planning.
arXiv Detail & Related papers (2025-10-29T16:59:07Z)
DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision [50.89715397781075]
Agentic Retrieval-Augmented Generation (Agentic RAG) enhances the processing capability for complex tasks.<n>We propose DecEx-RAG, which models RAG as a Markov Decision Process (MDP) incorporating decision-making and execution.<n>We show that DecEx-RAG achieves an average absolute performance improvement of $6.2%$ across six datasets.
arXiv Detail & Related papers (2025-10-07T08:49:22Z)
CoE-Ops: Collaboration of LLM-based Experts for AIOps Question-Answering [10.093542296324845]
This paper first proposes a collaboration-of-expert framework(CoE-Ops) incorporating a general-purpose large language model task classifier.<n>A retrieval-augmented generation mechanism is introduced to improve the framework's capability in handling both Question-Answering tasks with high-level(Code,build,Test, etc.) and low-level(fault analysis,anomaly detection, etc.)<n> Experimental results demonstrate that CoE-Ops achieves up to 8% accuracy enhancement for high-level AIOps tasks compared to existing CoE methods.
arXiv Detail & Related papers (2025-07-25T06:17:11Z)
EvoAgentX: An Automated Framework for Evolving Agentic Workflows [21.464686605154792]
We present EvoAgentX, an open-source platform that automates the generation, execution, and evolutionary optimization of multi-agent systems.<n>We evaluate EvoAgentX on HotPotQA, MBPP, and MATH for multi-hop reasoning, code generation, and mathematical problem solving, respectively, and further assess it on real-world tasks using GAIA.
arXiv Detail & Related papers (2025-07-04T14:43:10Z)
On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Repo2Run: Automated Building Executable Environment for Code Repository at Scale [10.143091612327602]
We introduce Repo2Run, an agent aiming at automating the building of executable test environments for any repositories at scale.<n>Repo2Run iteratively builds the Docker image, runs unit tests based on the feedback of the building, and synthesizes the Dockerfile.<n>The resulting Dockerfile can then be used to create Docker container environments for running code and tests.
arXiv Detail & Related papers (2025-02-19T12:51:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.