EvoAgentX: An Automated Framework for Evolving Agentic Workflows
- URL: http://arxiv.org/abs/2507.03616v2
- Date: Tue, 23 Sep 2025 14:00:26 GMT
- Title: EvoAgentX: An Automated Framework for Evolving Agentic Workflows
- Authors: Yingxu Wang, Siwei Liu, Jinyuan Fang, Zaiqiao Meng,
- Abstract summary: We present EvoAgentX, an open-source platform that automates the generation, execution, and evolutionary optimization of multi-agent systems.<n>We evaluate EvoAgentX on HotPotQA, MBPP, and MATH for multi-hop reasoning, code generation, and mathematical problem solving, respectively, and further assess it on real-world tasks using GAIA.
- Score: 21.464686605154792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-agent systems (MAS) have emerged as a powerful paradigm for orchestrating large language models (LLMs) and specialized tools to collaboratively address complex tasks. However, existing MAS frameworks often require manual workflow configuration and lack native support for dynamic evolution and performance optimization. In addition, many MAS optimization algorithms are not integrated into a unified framework. In this paper, we present EvoAgentX, an open-source platform that automates the generation, execution, and evolutionary optimization of multi-agent workflows. EvoAgentX employs a modular architecture consisting of five core layers: the basic components, agent, workflow, evolving, and evaluation layers. Specifically, within the evolving layer, EvoAgentX integrates three MAS optimization algorithms, TextGrad, AFlow, and MIPRO, to iteratively refine agent prompts, tool configurations, and workflow topologies. We evaluate EvoAgentX on HotPotQA, MBPP, and MATH for multi-hop reasoning, code generation, and mathematical problem solving, respectively, and further assess it on real-world tasks using GAIA. Experimental results show that EvoAgentX consistently achieves significant performance improvements, including a 7.44% increase in HotPotQA F1, a 10.00% improvement in MBPP pass@1, a 10.00% gain in MATH solve accuracy, and an overall accuracy improvement of up to 20.00% on GAIA. The source code is available at: https://github.com/EvoAgentX/EvoAgentX
Related papers
- Evolutionary Generation of Multi-Agent Systems [49.47969796873096]
Large language model (LLM)-based multi-agent systems (MAS) show strong promise for complex reasoning, planning, and tool-augmented tasks.<n>EvoMAS formulates MAS generation as structured configuration generation.<n>EvoMAS consistently improves task performance over both human-designed MAS and prior automatic MAS generation methods.
arXiv Detail & Related papers (2026-02-06T09:01:35Z) - MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering [54.236614097082395]
We introduce MEnvAgent, a framework for automated Environment construction.<n>MEnvAgent employs a multi-agent Planning-Execution-Verification architecture to autonomously resolve construction failures.<n>MEnvData-SWE is the largest open-source polyglot dataset of realistic verifiable Docker environments to date.
arXiv Detail & Related papers (2026-01-30T11:36:10Z) - Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization [37.17893162265247]
Youtu-Agent is a framework designed for the automated generation and continuous evolution of Large Language Model (LLM) agents.<n>Youtu-Agent features a structured configuration system that decouples execution environments, toolkits, and context management.<n> Experiments demonstrate that Youtu-Agent achieves state-of-the-art performance on WebWalkerQA (71.47%) and GAIA (72.8%) using open-weight models.
arXiv Detail & Related papers (2025-12-31T04:17:36Z) - Evolving Excellence: Automated Optimization of LLM-based Agents [33.81822162934331]
We present ARTEMIS, a no-code evolutionary optimization platform that jointly optimize agent configurations through semantically-aware genetic operators.<n>We evaluate ARTEMIS on four representative agent systems: the emphALE Agent for competitive programming on AtCoder Heuristic Contest, achieving a textbf$13.6%$ improvement in acceptance rate.<n>We also evaluate the emphMathTales-Teacher Agent powered by a smaller open-source model (Qwen2.5-7B) on GSM8K primary-level mathematics problems, achieving a textbf
arXiv Detail & Related papers (2025-12-09T20:48:45Z) - xLLM Technical Report [57.13120905321185]
We introduce xLLM, an intelligent and efficient Large Language Model (LLM) inference framework.<n>xLLM builds a novel decoupled service-engine architecture.<n>xLLM-Engine co-optimizes system and algorithm designs to fully saturate computing resources.
arXiv Detail & Related papers (2025-10-16T13:53:47Z) - Foam-Agent 2.0: An End-to-End Composable Multi-Agent Framework for Automating CFD Simulation in OpenFOAM [3.8790078849619074]
We introduce Foam-Agent, a multi-agent framework that automates the entire end-to-end OpenFOAM workflow from a single natural language prompt.<n>The framework uses Model Context Protocol (MCP) to expose its core functions as discrete, callable tools.<n> evaluated on a benchmark of 110 simulation tasks, Foam-Agent achieves an 88.2% success rate with Claude 3.5 Sonnet.
arXiv Detail & Related papers (2025-09-17T23:44:18Z) - AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need [35.88121318813734]
Large language model based multi-agent systems have demonstrated significant potential in social simulation and complex task resolution domains.<n>We introduce AgentGroupChat-V2, a novel framework addressing these challenges through three core innovations.<n>Experiments demonstrate AgentGroupChat-V2's superior performance across diverse domains.
arXiv Detail & Related papers (2025-06-18T13:24:04Z) - R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution [60.80016554091364]
R&D-Agent is a dual-agent framework for iterative exploration.<n>The Researcher agent uses performance feedback to generate ideas, while the Developer agent refines code based on error feedback.<n>R&D-Agent is evaluated on MLE-Bench and emerges as the top-performing machine learning engineering agent.
arXiv Detail & Related papers (2025-05-20T06:07:00Z) - Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning [29.580108004844856]
Multi-agent systems (MAS) built on large language models (LLMs) offer a promising path toward solving complex, real-world tasks.<n>Recent advancements in test-time scaling (TTS) have significantly improved single-agent performance on challenging reasoning tasks.<n>We introduce an adaptive multi-agent framework designed to enhance collaborative reasoning through both model-level training and system-level coordination.
arXiv Detail & Related papers (2025-04-14T00:27:45Z) - ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation [71.31634636156384]
We introduce ComfyGPT, the first self-optimizing multi-agent system designed to generate ComfyUI based on task descriptions automatically.<n> ComfyGPT comprises four specialized agents: ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent.<n> FlowDataset is a large-scale dataset containing 13,571 workflow-description pairs, and FlowBench is a benchmark for evaluating workflow generation systems.
arXiv Detail & Related papers (2025-03-22T06:48:50Z) - EvoFlow: Evolving Diverse Agentic Workflows On The Fly [21.82515160298748]
EvoFlow is a niching evolutionary algorithm-based framework to automatically search a population of complexity and heterogeneous agentic.<n>We show that EvoFlow can evolve a population ranging from simple I/O tasks to complex multi-turn interactions.
arXiv Detail & Related papers (2025-02-11T08:48:46Z) - A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops [3.729242965449096]
This paper introduces a framework for autonomously optimizing Agentic AI solutions across industries.<n>The framework achieves optimal performance without human input by autonomously generating and testing hypotheses.<n>Case studies show significant improvements in output quality, relevance, and actionability.
arXiv Detail & Related papers (2024-12-22T20:08:04Z) - ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems [80.69865295743149]
This work attempts to study using LLM-based agents to design collaborative AI systems autonomously.<n>Based on ComfyBench, we develop ComfyAgent, a framework that empowers agents to autonomously design collaborative AI systems by generating.<n>While ComfyAgent achieves a comparable resolve rate to o1-preview and significantly surpasses other agents on ComfyBench, ComfyAgent has resolved only 15% of creative tasks.
arXiv Detail & Related papers (2024-09-02T17:44:10Z) - Very Large-Scale Multi-Agent Simulation in AgentScope [112.98986800070581]
We develop new features and components for AgentScope, a user-friendly multi-agent platform.
We propose an actor-based distributed mechanism towards great scalability and high efficiency.
We also provide a web-based interface for conveniently monitoring and managing a large number of agents.
arXiv Detail & Related papers (2024-07-25T05:50:46Z) - CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents [49.68117560675367]
Crab is the first benchmark framework designed to support cross-environment tasks.<n>Our framework supports multiple devices and can be easily extended to any environment with a Python interface.<n>The experimental results demonstrate that the single agent with GPT-4o achieves the best completion ratio of 38.01%.
arXiv Detail & Related papers (2024-07-01T17:55:04Z) - EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms [55.77492625524141]
EvoAgent is a generic method to automatically extend specialized agents to multi-agent systems.<n>We show that EvoAgent can significantly enhance the task-solving capability of LLM-based agents.
arXiv Detail & Related papers (2024-06-20T11:49:23Z) - SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents [50.82665351100067]
FlowGen is a code generation framework that emulates software process models based on multiple Large Language Model (LLM) agents.
We evaluate FlowGenScrum on four benchmarks: HumanEval, HumanEval-ET, MBPP, and MBPP-ET.
arXiv Detail & Related papers (2024-03-23T14:04:48Z) - A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration [55.35849138235116]
We propose automatically selecting a team of agents from candidates to collaborate in a dynamic communication structure toward different tasks and domains.
Specifically, we build a framework named Dynamic LLM-Powered Agent Network ($textDyLAN$) for LLM-powered agent collaboration.
We demonstrate that DyLAN outperforms strong baselines in code generation, decision-making, general reasoning, and arithmetic reasoning tasks with moderate computational cost.
arXiv Detail & Related papers (2023-10-03T16:05:48Z) - EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary
Computation [40.71953374838183]
EvoX is a computing framework tailored for automated, distributed, and heterogeneous execution of EC algorithms.
At the core of EvoX lies a unique programming model to streamline the development of parallelizable EC algorithms.
EvoX offers comprehensive support for a diverse set of benchmark problems, ranging from dozens of numerical test functions to hundreds of reinforcement learning tasks.
arXiv Detail & Related papers (2023-01-29T15:00:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.