Related papers: Rethinking the Value of Multi-Agent Workflow: A Strong Single Agent Baseline

Rethinking the Value of Multi-Agent Workflow: A Strong Single Agent Baseline

URL: http://arxiv.org/abs/2601.12307v1
Date: Sun, 18 Jan 2026 08:16:09 GMT
Title: Rethinking the Value of Multi-Agent Workflow: A Strong Single Agent Baseline
Authors: Jiawei Xu, Arief Koesdwiady, Sisong Bei, Yan Han, Baixiang Huang, Dakuo Wang, Yutong Chen, Zheshen Wang, Peihao Wang, Pan Li, Ying Ding,
Abstract summary: We show that a single agent can reach the performance of homogeneous with an efficiency advantage from KV cache reuse.<n>We propose an algorithm that automatically tailors for single-agent execution, reducing inference costs.
Score: 38.16649115214312
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in LLM-based multi-agent systems (MAS) show that workflows composed of multiple LLM agents with distinct roles, tools, and communication patterns can outperform single-LLM baselines on complex tasks. However, most frameworks are homogeneous, where all agents share the same base LLM and differ only in prompts, tools, and positions in the workflow. This raises the question of whether such workflows can be simulated by a single agent through multi-turn conversations. We investigate this across seven benchmarks spanning coding, mathematics, general question answering, domain-specific reasoning, and real-world planning and tool use. Our results show that a single agent can reach the performance of homogeneous workflows with an efficiency advantage from KV cache reuse, and can even match the performance of an automatically optimized heterogeneous workflow. Building on this finding, we propose \textbf{OneFlow}, an algorithm that automatically tailors workflows for single-agent execution, reducing inference costs compared to existing automatic multi-agent design frameworks without trading off accuracy. These results position the single-LLM implementation of multi-agent workflows as a strong baseline for MAS research. We also note that single-LLM methods cannot capture heterogeneous workflows due to the lack of KV cache sharing across different LLMs, highlighting future opportunities in developing \textit{truly} heterogeneous multi-agent systems.

Related papers

Do We Always Need Query-Level Workflows? Rethinking Agentic Workflow Generation for Multi-Agent Systems [72.3575737073235]
Multi-Agent Systems (MAS) solve complex tasks by coordinating multiple agents through.<n>Existing approaches generates either at task level or query level, but their relative costs and benefits remain unclear.<n>We show that query-level workflow generation is not always necessary, since a small set of top-K best task-level together already covers equivalent or even more queries.
arXiv Detail & Related papers (2026-01-16T10:05:51Z)
Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z)
InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios [28.65914611521654]
InfiAgent is a Pyramid-like DAG-based Multi-Agent Framework that can be applied to textbfinfinite scenarios.<n>InfiAgent achieves 9.9% higher performance compared to ADAS (similar auto-generated agent framework)
arXiv Detail & Related papers (2025-09-26T15:44:09Z)
Difficulty-Aware Agentic Orchestration for Query-Specific Multi-Agent Workflows [5.376980683850103]
We propose Difficulty-Aware Agentic Orchestration (DAAO), which generates query-specific multi-agent guided by predicted difficulty.<n>DAAO surpasses prior multi-agent systems in both accuracy and inference efficiency.
arXiv Detail & Related papers (2025-09-14T03:57:43Z)
Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration [63.90193684394165]
We introduce multi-agent cross-task experiential learning (MAEL), a novel framework that endows LLM-driven agents with explicit cross-task learning and experience accumulation.<n>During the experiential learning phase, we quantify the quality for each step in the task-solving workflow and store the resulting rewards.<n>During inference, agents retrieve high-reward, task-relevant experiences as few-shot examples to enhance the effectiveness of each reasoning step.
arXiv Detail & Related papers (2025-05-29T07:24:37Z)
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs [38.8226073406788]
This paper explores the paradigm of heterogeneous LLM-driven multi-agent systems (MAS)<n>We introduce X-MAS-Bench, a comprehensive testbed designed to evaluate the performance of various LLMs across different domains and MAS-related functions.<n>We demonstrate that transitioning from homogeneous to heterogeneous MAS can significantly enhance system performance without requiring structural redesign.
arXiv Detail & Related papers (2025-05-22T17:56:39Z)
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.<n>We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.<n>We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.