Related papers: MALT: Improving Reasoning with Multi-Agent LLM Training

MALT: Improving Reasoning with Multi-Agent LLM Training

URL: http://arxiv.org/abs/2412.01928v1
Date: Mon, 02 Dec 2024 19:30:36 GMT
Title: MALT: Improving Reasoning with Multi-Agent LLM Training
Authors: Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das, Markian Rybchuk, Philip H. S. Torr, Ivan Laptev, Fabio Pizzati, Ronald Clark, Christian Schroeder de Witt,
Abstract summary: We present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems.<n>Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles.<n>We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively.
Score: 64.13803241218886
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Enabling effective collaboration among LLMs is a crucial step toward developing autonomous systems capable of solving complex problems. While LLMs are typically used as single-model generators, where humans critique and refine their outputs, the potential for jointly-trained collaborative models remains largely unexplored. Despite promising results in multi-agent communication and debate settings, little progress has been made in training models to work together on tasks. In this paper, we present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems. Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles: a generator, verifier, and refinement model iteratively solving problems. We propose a trajectory-expansion-based synthetic data generation process and a credit assignment strategy driven by joint outcome based rewards. This enables our post-training setup to utilize both positive and negative trajectories to autonomously improve each model's specialized capabilities as part of a joint sequential system. We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively over the same baseline model. This demonstrates an early advance in multi-agent cooperative capabilities for performance on mathematical and common sense reasoning questions. More generally, our work provides a concrete direction for research around multi-agent LLM training approaches.

Related papers

Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems [20.971694319263353]
We propose Dr. MAS, a simple and stable RL training recipe for multi-agent LLM systems.<n>Dr. MAS uses agent-wise remedy: normalizing advantages per agent using each agent's own reward statistics.<n>Dr. MAS provides an end-to-end RL training framework for multi-agent LLM systems.
arXiv Detail & Related papers (2026-02-09T16:13:39Z)
AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent [57.10083973844841]
AgentArk is a novel framework to distill multi-agent dynamics into the weights of a single model.<n>We investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios.<n>By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents.
arXiv Detail & Related papers (2026-02-03T19:18:28Z)
Multimodal Reinforcement Learning with Agentic Verifier for AI Agents [131.46008226323423]
Argos is a principled multimodal reward agent to train reasoning models for agentic tasks.<n>By leveraging our agentic verifier across both SFT data and RL training, our model achieves state-of-the-art results.
arXiv Detail & Related papers (2025-12-03T04:42:47Z)
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO [24.532870400949424]
Current training methods train a unified large language model for all agents in the system.<n>This may limit the performances due to different underlying distributions for different agents.<n>We propose M-GRPO, a hierarchical extension of Group Relative Policy Optimization for vertical Multi-agent systems.
arXiv Detail & Related papers (2025-11-17T12:06:30Z)
AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress [71.02263260394261]
Large language models (LLMs) still encounter challenges in multi-turn decision-making tasks.<n>We build process reward models (PRMs) to evaluate each decision and guide the agent's decision-making process.<n>AgentPRM captures both the interdependence between sequential decisions and their contribution to the final goal.
arXiv Detail & Related papers (2025-11-11T14:57:54Z)
AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework [76.96794548655292]
Large language models (LLMs) have sparked growing interest in building generalist agents that can learn through online interactions.<n>Applying reinforcement learning (RL) to train LLM agents in multi-turn, multi-task settings remains challenging due to lack of scalable infrastructure and stable training algorithms.<n>We present the AgentRL framework for scalable multi-turn, multi-task agentic RL training.
arXiv Detail & Related papers (2025-10-05T13:40:01Z)
How to Train a Leader: Hierarchical Reasoning in Multi-Agent LLMs [16.853362180877593]
We introduce a hierarchical multi-agent framework that trains only a single leader LLM to coordinate a team of untrained peer agents.<n>Our results highlight the effectiveness and efficiency of training a single, flexible leader for collaborative reasoning in multi-agent LLM systems.
arXiv Detail & Related papers (2025-07-11T18:34:07Z)
Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One [28.264011412168347]
Model ensemble is a useful approach in reinforcement learning (RL) for training effective agents.<n>We propose LLM-Ens, a novel approach that enhances RL model ensemble with task-specific semantic understandings.
arXiv Detail & Related papers (2025-05-21T09:35:43Z)
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration [63.31211701741323]
We extend multi-agent multi-model reasoning to generation, specifically to improving faithfulness through refinement. We design intrinsic evaluations for each subtask, with our findings indicating that both multi-agent (multiple instances) and multi-model (diverse LLM types) approaches benefit error detection and critiquing. We consolidate these insights into a final "recipe" called Multi-Agent Multi-Model Refinement (MAMM-Refine), where multi-agent and multi-model collaboration significantly boosts performance.
arXiv Detail & Related papers (2025-03-19T14:46:53Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration [81.45763823762682]
This work aims to bridge the gap by investigating the problem of data synthesis through multi-agent sampling. We introduce Tree Search-based Orchestrated Agents(TOA), where the workflow evolves iteratively during the sequential sampling process. Our experiments on alignment, machine translation, and mathematical reasoning demonstrate that multi-agent sampling significantly outperforms single-agent sampling as inference compute scales.
arXiv Detail & Related papers (2024-12-22T15:16:44Z)
Training Agents with Weakly Supervised Feedback from Large Language Models [19.216542820742607]
This paper introduces a novel training method for LLM-based agents using weakly supervised signals from a critic LLM. Our agents are trained in iterative manner, where they initially generate trajectories through environmental interaction. Tests on the API-bank dataset show consistent improvement in our agents' capabilities and comparable performance to GPT-4.
arXiv Detail & Related papers (2024-11-29T08:47:04Z)
Mars-PO: Multi-Agent Reasoning System Preference Optimization [16.145823558485393]
We propose Mars-PO, a novel framework to improve the mathematical reasoning capabilities of large language models (LLMs)<n>It combines high-quality outputs from multiple agents into a hybrid positive sample set and pairs them with agent-specific negative samples to construct robust preference pairs for training.<n>By aligning agents with shared positive samples while addressing individual weaknesses, Mars-PO achieves substantial performance improvements on mathematical reasoning benchmarks.
arXiv Detail & Related papers (2024-11-28T10:35:16Z)
Optimizing Collaboration of LLM based Agents for Finite Element Analysis [1.5039745292757671]
This paper investigates the interactions between multiple agents within Large Language Models (LLMs) in the context of programming and coding tasks. We utilize the AutoGen framework to facilitate communication among agents, evaluating different configurations based on the success rates from 40 random runs for each setup.
arXiv Detail & Related papers (2024-08-23T23:11:08Z)
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z)
LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions [8.55917897789612]
We focus on the cooperative tasks of multiple agents with a common goal and communication among them. We also consider human-in/on-the-loop scenarios enabled by the language component in the framework.
arXiv Detail & Related papers (2024-05-17T22:10:23Z)
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning [14.635361844362794]
Smurfs' is a cutting-edge multi-agent framework designed to revolutionize the application of large language models. Smurfs can enhance the model's ability to solve complex tasks at no additional cost.
arXiv Detail & Related papers (2024-05-09T17:49:04Z)
Large Language Model-based Human-Agent Collaboration for Complex Task Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving. We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC. This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets. One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team. We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z)
Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems. We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.