Multi-Agent Tool-Integrated Policy Optimization
- URL: http://arxiv.org/abs/2510.04678v1
- Date: Mon, 06 Oct 2025 10:44:04 GMT
- Title: Multi-Agent Tool-Integrated Policy Optimization
- Authors: Zhanfeng Mo, Xingxuan Li, Yuntao Chen, Lidong Bing,
- Abstract summary: Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
- Score: 67.12841355267678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks. Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses. A natural solution is to adopt a multi-agent framework with planner- and worker-agents to manage context. However, no existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks. To address this gap, we propose Multi-Agent Tool-Integrated Policy Optimization (MATPO), which enables distinct roles (planner and worker) to be trained within a single LLM instance using role-specific prompts via reinforcement learning. MATPO is derived from a principled credit assignment mechanism across planner and worker rollouts. This design eliminates the need to deploy multiple LLMs, which would be memory-intensive, while preserving the benefits of specialization. Experiments on GAIA-text, WebWalkerQA, and FRAMES show that MATPO consistently outperforms single-agent baselines by an average of 18.38% relative improvement in performance and exhibits greater robustness to noisy tool outputs. Our findings highlight the effectiveness of unifying multiple agent roles within a single LLM and provide practical insights for stable and efficient multi-agent RL training.
Related papers
- AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent [57.10083973844841]
AgentArk is a novel framework to distill multi-agent dynamics into the weights of a single model.<n>We investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios.<n>By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents.
arXiv Detail & Related papers (2026-02-03T19:18:28Z) - Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO [24.532870400949424]
Current training methods train a unified large language model for all agents in the system.<n>This may limit the performances due to different underlying distributions for different agents.<n>We propose M-GRPO, a hierarchical extension of Group Relative Policy Optimization for vertical Multi-agent systems.
arXiv Detail & Related papers (2025-11-17T12:06:30Z) - AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework [76.96794548655292]
Large language models (LLMs) have sparked growing interest in building generalist agents that can learn through online interactions.<n>Applying reinforcement learning (RL) to train LLM agents in multi-turn, multi-task settings remains challenging due to lack of scalable infrastructure and stable training algorithms.<n>We present the AgentRL framework for scalable multi-turn, multi-task agentic RL training.
arXiv Detail & Related papers (2025-10-05T13:40:01Z) - AgentFly: Extensible and Scalable Reinforcement Learning for LM Agents [25.735754822676277]
Language model (LM) agents have gained significant attention for their ability to autonomously complete tasks.<n> reinforcement learning (RL) has been explored to enhance LM's capabilities, such as reasoning and factuality.<n>We built AgentFly, a scalable and Agent-RL framework designed to empower LM agents with a variety of RL algorithms.
arXiv Detail & Related papers (2025-07-20T10:22:36Z) - How to Train a Leader: Hierarchical Reasoning in Multi-Agent LLMs [16.853362180877593]
We introduce a hierarchical multi-agent framework that trains only a single leader LLM to coordinate a team of untrained peer agents.<n>Our results highlight the effectiveness and efficiency of training a single, flexible leader for collaborative reasoning in multi-agent LLM systems.
arXiv Detail & Related papers (2025-07-11T18:34:07Z) - Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One [28.264011412168347]
Model ensemble is a useful approach in reinforcement learning (RL) for training effective agents.<n>We propose LLM-Ens, a novel approach that enhances RL model ensemble with task-specific semantic understandings.
arXiv Detail & Related papers (2025-05-21T09:35:43Z) - Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z) - Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs.
We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer.
This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z) - Recommender AI Agent: Integrating Large Language Models for Interactive
Recommendations [53.76682562935373]
We introduce an efficient framework called textbfInteRecAgent, which employs LLMs as the brain and recommender models as tools.
InteRecAgent achieves satisfying performance as a conversational recommender system, outperforming general-purpose LLMs.
arXiv Detail & Related papers (2023-08-31T07:36:44Z) - A Unified and Efficient Coordinating Framework for Autonomous DBMS
Tuning [34.85351481228439]
We propose a unified coordinating framework to efficiently utilize existing ML-based agents.
We show that it can effectively utilize different ML-based agents and find better configurations with 1.414.1X speedups on the workload execution time.
arXiv Detail & Related papers (2023-03-10T05:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.