Optimizing Sequential Multi-Step Tasks with Parallel LLM Agents
- URL: http://arxiv.org/abs/2507.08944v1
- Date: Fri, 11 Jul 2025 18:09:22 GMT
- Title: Optimizing Sequential Multi-Step Tasks with Parallel LLM Agents
- Authors: Enhao Zhang, Erkang Zhu, Gagan Bansal, Adam Fourney, Hussein Mozannar, Jack Gerrits,
- Abstract summary: M1-Parallel is a framework that concurrently runs multiple multi-agent teams in parallel to uncover distinct solution paths.<n>We show that M1-Parallel with early termination achieves up to $2.2times$ speedup while preserving accuracy.<n>We further investigate strategies aimed at encouraging diverse execution plans but observe no additional performance gains over repeated sampling.
- Score: 15.26802977779826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language model (LLM)-based multi-agent systems have demonstrated remarkable promise for tackling complex tasks by breaking them down into subtasks that are iteratively planned, executed, observed, and refined. Despite their effectiveness, these systems often incur high latency because real-world problems frequently demand multiple iterative cycles of reasoning steps. To address this challenge, we propose M1-Parallel, a framework that concurrently runs multiple multi-agent teams in parallel to uncover distinct solution paths. By leveraging an event-driven communication model with asynchronous messaging, M1-Parallel efficiently capitalizes on the inherent diversity of valid plans to either reduce end-to-end latency or boost task completion rates. Our experiments on complex tasks show that M1-Parallel with early termination achieves up to $2.2\times$ speedup while preserving accuracy, and that M1-Parallel with aggregation yields higher task completion rates. We further investigate strategies aimed at encouraging diverse execution plans but observe no additional performance gains over repeated sampling. Overall, these findings underscore the potential of parallel plan execution for optimizing multi-agent systems for real-world, high-complexity reasoning tasks.
Related papers
- AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks [33.858780386822836]
Test-time scaling (TTS) enhances the performance of large language models (LLMs) by allocating additional compute resources during inference.<n>We study a novel problem: the test-time compute-optimal scaling in multi-stage complex tasks.<n>We propose AgentTTS, an LLM-agent-based framework that autonomously searches for compute-optimal allocations.
arXiv Detail & Related papers (2025-07-26T19:21:18Z) - EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models [65.48902212293903]
We present the Extremely Complex Instruction Following Benchmark (EIFBENCH) for evaluating large language models (LLMs)<n>EIFBENCH includes multi-task scenarios that enable comprehensive assessment across diverse task types concurrently.<n>We also propose the Segment Policy Optimization (SegPO) algorithm to enhance the LLM's ability to accurately fulfill multi-task workflow.
arXiv Detail & Related papers (2025-06-10T02:39:55Z) - Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation [23.318601470116498]
We aim to improve the offline end-to-end inference efficiency of multi-LLM applications in a single-node multi-GPU environment.<n>We propose a sampling-then-simulation method to estimate the model running time.<n>Experiments on 3 applications and a mixed application show that SamuLLM can achieve 1.0-2.4$times$ end-to-end speedups.
arXiv Detail & Related papers (2025-03-21T06:56:35Z) - MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration [63.31211701741323]
We extend multi-agent multi-model reasoning to generation, specifically to improving faithfulness through refinement.<n>We design intrinsic evaluations for each subtask, with our findings indicating that both multi-agent (multiple instances) and multi-model (diverse LLM types) approaches benefit error detection and critiquing.<n>We consolidate these insights into a final "recipe" called Multi-Agent Multi-Model Refinement (MAMM-Refine), where multi-agent and multi-model collaboration significantly boosts performance.
arXiv Detail & Related papers (2025-03-19T14:46:53Z) - Haste Makes Waste: Evaluating Planning Abilities of LLMs for Efficient and Feasible Multitasking with Time Constraints Between Actions [56.88110850242265]
We present Recipe2Plan, a novel benchmark framework based on real-world cooking scenarios.<n>Unlike conventional benchmarks, Recipe2Plan challenges agents to optimize cooking time through parallel task execution.
arXiv Detail & Related papers (2025-03-04T03:27:02Z) - Robotouille: An Asynchronous Planning Benchmark for LLM Agents [7.574421886354134]
Asynchronous planning is essential for agents that must account for time delays, reason over diverse long-horizon tasks, and collaborate with other agents.<n>We introduce Robotouille, a benchmark environment designed to test agents' ability to handle long-horizon asynchronous scenarios.<n>Our results show that ReAct (gpt4-o) achieves 47% on synchronous tasks but only 11% on asynchronous tasks, highlighting significant room for improvement.
arXiv Detail & Related papers (2025-02-06T05:50:37Z) - Planning with Multi-Constraints via Collaborative Language Agents [13.550774629515843]
This paper introduces Planning with Multi-Constraints (PMC), a zero-shot methodology for collaborative multi-agent systems.<n>PMC simplifies complex task planning with constraints by decomposing it into a hierarchy of subordinate tasks.<n>PMC achieved an average 42.68% success rate on TravelPlanner, significantly higher than GPT-4 (2.92%), and outperforming GPT-4 with ReAct on API-Bank by 13.64%.
arXiv Detail & Related papers (2024-05-26T10:33:17Z) - ADaPT: As-Needed Decomposition and Planning with Language Models [131.063805299796]
We introduce As-Needed Decomposition and Planning for complex Tasks (ADaPT)
ADaPT explicitly plans and decomposes complex sub-tasks as-needed, when the Large Language Models is unable to execute them.
Our results demonstrate that ADaPT substantially outperforms established strong baselines.
arXiv Detail & Related papers (2023-11-08T17:59:15Z) - E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel
Program Guidance [20.03014783858498]
We introduce Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance(E-MAPP)
E-MAPP is a novel framework that leverages parallel programs to guide multiple agents to efficiently accomplish goals that require planning over $10+$ stages.
Results show that E-MAPP outperforms strong baselines in terms of the completion rate, time efficiency, and zero-shot generalization ability by a large margin.
arXiv Detail & Related papers (2022-12-05T07:02:05Z) - M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
Learning with Model-Accelerator Co-design [95.41238363769892]
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly.
Current MTL regimes have to activate nearly the entire model even to just execute a single task.
We present a model-accelerator co-design framework to enable efficient on-device MTL.
arXiv Detail & Related papers (2022-10-26T15:40:24Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.