How to Train a Leader: Hierarchical Reasoning in Multi-Agent LLMs
- URL: http://arxiv.org/abs/2507.08960v1
- Date: Fri, 11 Jul 2025 18:34:07 GMT
- Title: How to Train a Leader: Hierarchical Reasoning in Multi-Agent LLMs
- Authors: Andrew Estornell, Jean-Francois Ton, Muhammad Faaiz Taufiq, Hang Li,
- Abstract summary: We introduce a hierarchical multi-agent framework that trains only a single leader LLM to coordinate a team of untrained peer agents.<n>Our results highlight the effectiveness and efficiency of training a single, flexible leader for collaborative reasoning in multi-agent LLM systems.
- Score: 16.853362180877593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have achieved strong performance on a wide range of complex reasoning tasks, yet further gains are often possible by leveraging the complementary strengths of multiple models. While multi-agent frameworks can improve solution quality by leveraging multiple LLMs, existing methods are often computationally expensive, both at training and inference time. In this work, we introduce a hierarchical multi-agent framework that addresses these challenges by training only a single leader LLM to coordinate a team of untrained peer agents. To this end, we propose Multi-agent guided Leader Policy \textbf{O}ptimization (MLPO), a novel approach which trains the leader to evaluate and synthesize agent responses without auxiliary value networks or explicit agent feedback. Leaders trained with MLPO exhibit improved performance not only when interacting with the agent team at inference time, but also enjoy improved performance when deployed in single-agent settings without the team. Empirical results on Big-Bench Hard (BBH), MATH, and MMLU demonstrate that our framework achieves substantial performance improvements over both single-agent and multi-agent baselines. Our results highlight the effectiveness and efficiency of training a single, flexible leader for collaborative reasoning in multi-agent LLM systems.
Related papers
- LLM Collaboration With Multi-Agent Reinforcement Learning [13.900227188164209]
We develop a multi-agent, multi-turn algorithm, Multi-Agent Group Relative Policy Optimization (MAGRPO), to solve a cooperative Multi-Agent Reinforcement Learning (MARL) problem.<n>Our experiments on LLM writing and coding collaboration demonstrate that fine-tuning MAS with MAGRPO enables agents to generate high-quality responses efficiently through effective cooperation.
arXiv Detail & Related papers (2025-08-06T17:18:25Z) - An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring [8.779871128906787]
We introduce a general and adversary-resistant multi-agent LLM framework based on credibility scoring.<n>Our system associates a credibility score that is used when aggregating the team outputs.
arXiv Detail & Related papers (2025-05-30T05:57:37Z) - Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration [63.90193684394165]
We introduce multi-agent cross-task experiential learning (MAEL), a novel framework that endows LLM-driven agents with explicit cross-task learning and experience accumulation.<n>During the experiential learning phase, we quantify the quality for each step in the task-solving workflow and store the resulting rewards.<n>During inference, agents retrieve high-reward, task-relevant experiences as few-shot examples to enhance the effectiveness of each reasoning step.
arXiv Detail & Related papers (2025-05-29T07:24:37Z) - Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One [28.264011412168347]
Model ensemble is a useful approach in reinforcement learning (RL) for training effective agents.<n>We propose LLM-Ens, a novel approach that enhances RL model ensemble with task-specific semantic understandings.
arXiv Detail & Related papers (2025-05-21T09:35:43Z) - ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z) - MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning [26.736078756799635]
We introduce a new post-training paradigm MAPoRL (Multi-Agent Post-co-training for collaborative LLMs with Reinforcement Learning)<n>In MAPoRL, multiple LLMs first generate their own responses independently and engage in a multi-turn discussion to collaboratively improve the final answer.<n>A MAPoRL verifier evaluates both the answer and the discussion, by assigning a score that verifies the correctness of the answer.<n>The score serves as the co-training reward, and is then maximized through multi-agent RL.
arXiv Detail & Related papers (2025-02-25T18:33:48Z) - MALT: Improving Reasoning with Multi-Agent LLM Training [66.9481561915524]
MALT (Multi-Agent LLM Training) is a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps.<n>On MATH, GSM8K, and CSQA, MALT surpasses the same baseline LLM with a relative improvement of 15.66%, 7.42%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms [55.77492625524141]
EvoAgent is a generic method to automatically extend specialized agents to multi-agent systems.<n>We show that EvoAgent can significantly enhance the task-solving capability of LLM-based agents.
arXiv Detail & Related papers (2024-06-20T11:49:23Z) - Adaptive In-conversation Team Building for Language Model Agents [33.03550687362213]
Leveraging multiple large language model (LLM) agents has shown to be a promising approach for tackling complex tasks.<n>Our new adaptive team-building paradigm offers a flexible solution, realized through a novel agent design named Captain Agent.<n>A comprehensive evaluation across six real-world scenarios demonstrates that Captain Agent significantly outperforms existing multi-agent methods.
arXiv Detail & Related papers (2024-05-29T18:08:37Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.