Related papers: SEW: Self-Evolving Agentic Workflows for Automated Code Generation

SEW: Self-Evolving Agentic Workflows for Automated Code Generation

URL: http://arxiv.org/abs/2505.18646v1
Date: Sat, 24 May 2025 11:12:14 GMT
Title: SEW: Self-Evolving Agentic Workflows for Automated Code Generation
Authors: Siwei Liu, Jinyuan Fang, Han Zhou, Yingxu Wang, Zaiqiao Meng,
Abstract summary: We propose textbfSelf-textbfEvolving textbfWork (textbfSEW), a novel framework that automatically generates and optimises multi-agentflow.<n>Our SEW can automatically design agentic and optimise them through self-evolution, bringing up to 33% improvement on LiveCodeBench.
Score: 24.16770109875788
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated effectiveness in code generation tasks. To enable LLMs to address more complex coding challenges, existing research has focused on crafting multi-agent systems with agentic workflows, where complex coding tasks are decomposed into sub-tasks, assigned to specialized agents. Despite their effectiveness, current approaches heavily rely on hand-crafted agentic workflows, with both agent topologies and prompts manually designed, which limits their ability to automatically adapt to different types of coding problems. To address these limitations and enable automated workflow design, we propose \textbf{S}elf-\textbf{E}volving \textbf{W}orkflow (\textbf{SEW}), a novel self-evolving framework that automatically generates and optimises multi-agent workflows. Extensive experiments on three coding benchmark datasets, including the challenging LiveCodeBench, demonstrate that our SEW can automatically design agentic workflows and optimise them through self-evolution, bringing up to 33\% improvement on LiveCodeBench compared to using the backbone LLM only. Furthermore, by investigating different representation schemes of workflow, we provide insights into the optimal way to encode workflow information with text.

Related papers

Polymath: A Self-Optimizing Agent with Dynamic Hierarchical Workflow [6.636150750052998]
Large language models (LLMs) excel at solving complex tasks by executing agentic composed of detailed instructions and structured operations.<n>Many researchers have sought to automate the generation and optimization of these through code-based representations.<n>Existing methods often rely on labeled datasets to train and optimize, making them ineffective and inflexible for solving real-world, dynamic problems.
arXiv Detail & Related papers (2025-08-04T23:50:02Z)
Flow: Modularized Agentic Workflow Automation [53.073598156915615]
Multi-agent frameworks powered by large language models (LLMs) have demonstrated great success in automated planning and task execution.<n>However, the effective adjustment of agentic during execution has not been well studied.<n>In this paper, we define an activity-on-vertex (AOV) graph, which allows continuous workflow refinement by agents.<n>Our proposed multi-agent framework achieves efficient concurrent execution of subtasks, effective goal achievement, and enhanced error tolerance.
arXiv Detail & Related papers (2025-01-14T04:35:37Z)
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models [105.46456444315693]
We presentLLM, a data-centric framework to enhance the capability of large language models in workflow orchestration. It first constructs a large-scale fine-tuningBench with 106,763 samples, covering 1,503 APIs from 83 applications across 28 categories. LlamaLlama demonstrates a strong capacity to orchestrate complex APIs, while also achieving notable generalization performance.
arXiv Detail & Related papers (2024-11-08T09:58:02Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.<n>We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.<n>We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.<n>Recent works have started exploiting large language models (LLM) to lessen such burden.<n>This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z)
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation [87.39861573270173]
We introduce the novel task of prompt-adaptive workflow generation, where the goal is to automatically tailor a workflow to each user prompt. We propose two LLM-based approaches to tackle this task: a tuning-based method that learns from user-preference data, and a training-free method that uses the LLM to select existing flows. Our work shows that prompt-dependent flow prediction offers a new pathway to improving text-to-image generation quality, complementing existing research directions in the field.
arXiv Detail & Related papers (2024-10-02T16:43:24Z)
AutoFlow: Automated Workflow Generation for Large Language Model Agents [39.72700864347576]
Large Language Models (LLMs) have shown significant progress in understanding complex natural language. To make sure LLM Agents follow an effective and reliable procedure to solve the given task, manually designed are usually used. We propose AutoFlow, a framework designed to automatically generate for agents to solve complex tasks.
arXiv Detail & Related papers (2024-07-01T21:05:02Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.