Related papers: MARFT: Multi-Agent Reinforcement Fine-Tuning

MARFT: Multi-Agent Reinforcement Fine-Tuning

URL: http://arxiv.org/abs/2504.16129v2
Date: Thu, 24 Apr 2025 02:54:02 GMT
Title: MARFT: Multi-Agent Reinforcement Fine-Tuning
Authors: Junwei Liao, Muning Wen, Jun Wang, Weinan Zhang,
Abstract summary: This article presents a comprehensive study of Multi-Agent Reinforcement Fine-Tuning (MARFT)<n>MARFT is a paradigm termed Multi-Agent Reinforcement Fine-Tuning (LaMAS)<n>We introduce a universal algorithmic framework tailored for LaMAS, outlining the conceptual foundations, key distinctions, and practical implementation strategies.
Score: 26.527065316690123
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM-based Multi-Agent Systems have demonstrated remarkable capabilities in addressing complex, agentic tasks requiring multifaceted reasoning and collaboration, from generating high-quality presentation slides to conducting sophisticated scientific research. Meanwhile, RL has been widely recognized for its effectiveness in enhancing agent intelligence, but limited research has investigated the fine-tuning of LaMAS using foundational RL techniques. Moreover, the direct application of MARL methodologies to LaMAS introduces significant challenges, stemming from the unique characteristics and mechanisms inherent to LaMAS. To address these challenges, this article presents a comprehensive study of LLM-based MARL and proposes a novel paradigm termed Multi-Agent Reinforcement Fine-Tuning (MARFT). We introduce a universal algorithmic framework tailored for LaMAS, outlining the conceptual foundations, key distinctions, and practical implementation strategies. We begin by reviewing the evolution from RL to Reinforcement Fine-Tuning, setting the stage for a parallel analysis in the multi-agent domain. In the context of LaMAS, we elucidate critical differences between MARL and MARFT. These differences motivate a transition toward a novel, LaMAS-oriented formulation of RFT. Central to this work is the presentation of a robust and scalable MARFT framework. We detail the core algorithm and provide a complete, open-source implementation to facilitate adoption and further research. The latter sections of the paper explore real-world application perspectives and opening challenges in MARFT. By bridging theoretical underpinnings with practical methodologies, this work aims to serve as a roadmap for researchers seeking to advance MARFT toward resilient and adaptive solutions in agentic systems. Our implementation of the proposed framework is publicly available at: https://github.com/jwliao-ai/MARFT.

Related papers

Multi-Agent Reinforcement Learning for Resources Allocation Optimization: A Survey [9.798174763420896]
Multi-Agent Reinforcement Learning (MARL) has become a powerful framework for numerous real-world applications. This survey aims to support researchers and practitioners in leveraging MARL's potential to advance resource allocation solutions.
arXiv Detail & Related papers (2025-04-29T00:18:31Z)
A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives [6.277211882332452]
Multi-agent cooperative decision-making involves multiple agents working together to complete established tasks and achieve specific objectives.<n>These techniques are widely applicable in real-world scenarios such as autonomous driving, drone navigation, disaster rescue, and simulated military confrontations.
arXiv Detail & Related papers (2025-03-17T17:45:46Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [54.787341008881036]
We introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs) We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs. We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z)
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems. The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness. This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z)
Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task. We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z)
LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z)
Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents. There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z)
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration [83.4031923134958]
Corex is a suite of novel general-purpose strategies that transform Large Language Models into autonomous agents. Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes. We demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods.
arXiv Detail & Related papers (2023-09-30T07:11:39Z)
Learning Team-Based Navigation: A Review of Deep Reinforcement Learning Techniques for Multi-Agent Pathfinding [2.7898966850590625]
This review paper focuses on highlighting the integration of DRL-based approaches in MAPF. We aim to bridge the current gap in evaluating MAPF solutions by addressing the lack of unified evaluation metrics. Our paper discusses the potential of model-based DRL as a promising future direction and provides its required foundational understanding.
arXiv Detail & Related papers (2023-08-11T00:59:29Z)
Towards Global Optimality in Cooperative MARL with the Transformation And Distillation Framework [26.612749327414335]
Decentralized execution is one core demand in cooperative multi-agent reinforcement learning (MARL) In this paper, we theoretically analyze two common classes of algorithms with decentralized policies -- multi-agent policy gradient methods and value-decomposition methods. We show that TAD-PPO can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks.
arXiv Detail & Related papers (2022-07-12T06:59:13Z)
Model-based Multi-agent Reinforcement Learning: Recent Progress and Prospects [23.347535672670688]
Multi-Agent Reinforcement Learning (MARL) tackles sequential decision-making problems involving multiple participants. MARL requires a tremendous number of samples for effective training. Model-based methods have been shown to achieve provable advantages of sample efficiency.
arXiv Detail & Related papers (2022-03-20T17:24:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.