Related papers: A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem

A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem

URL: http://arxiv.org/abs/2305.17198v2
Date: Thu, 18 Jan 2024 16:25:38 GMT
Title: A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem
Authors: Paul Barde, Jakob Foerster, Derek Nowrouzezahrai, Amy Zhang
Abstract summary: Existing Multi-Agent Reinforcement Learning (MARL) methods are online and thus impractical for real-world applications in which collecting new interactions is costly or dangerous. We identify and formalize the strategy agreement (SA) and the strategy fine-tuning (SFT) coordination challenges. Our resulting algorithm, Model-based Offline Multi-Agent Proximal Policy Optimization (MOMA-PPO), generates synthetic interaction data and enables agents to converge on a strategy while fine-tuning their policies accordingly.
Score: 22.385585755496116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training multiple agents to coordinate is an essential problem with applications in robotics, game theory, economics, and social sciences. However, most existing Multi-Agent Reinforcement Learning (MARL) methods are online and thus impractical for real-world applications in which collecting new interactions is costly or dangerous. While these algorithms should leverage offline data when available, doing so gives rise to what we call the offline coordination problem. Specifically, we identify and formalize the strategy agreement (SA) and the strategy fine-tuning (SFT) coordination challenges, two issues at which current offline MARL algorithms fail. Concretely, we reveal that the prevalent model-free methods are severely deficient and cannot handle coordination-intensive offline multi-agent tasks in either toy or MuJoCo domains. To address this setback, we emphasize the importance of inter-agent interactions and propose the very first model-based offline MARL method. Our resulting algorithm, Model-based Offline Multi-Agent Proximal Policy Optimization (MOMA-PPO) generates synthetic interaction data and enables agents to converge on a strategy while fine-tuning their policies accordingly. This simple model-based solution solves the coordination-intensive offline tasks, significantly outperforming the prevalent model-free methods even under severe partial observability and with learned world models.

Related papers

Collab-Solver: Collaborative Solving Policy Learning for Mixed-Integer Linear Programming [57.44900640134789]
We propose a novel multi-agent-based policy learning framework for MILP solving as a Stackelberg game.<n>Specifically, we formulate the collaboration of cut selection and branching in MILP solving as a Stackelberg game.<n>The jointly learned policy significantly improves the solving performance on both synthetic and large-scale real-world MILP datasets.
arXiv Detail & Related papers (2025-08-05T03:16:04Z)
Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models [0.0]
A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end.<n>We propose and compare two communication strategies for a cooperative task-allocation problem.<n>Our experiments reveal that while emergent communication is viable in simple settings, the engineered, world model-based approach shows superior performance, sample efficiency, and scalability as complexity increases.
arXiv Detail & Related papers (2025-08-04T21:29:07Z)
Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts.<n>This work is the first work to explicitly address the distributional gap between offline and online MARL.
arXiv Detail & Related papers (2025-05-09T11:42:31Z)
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization [8.877649895977479]
offline Multi-Agent Reinforcement Learning (MARL) is an emerging field that aims to learn optimal multi-agent policies from pre-collected datasets. In this work, we revisit the existing offline MARL methods and show that in certain scenarios they can be problematic. We propose a new offline MARL algorithm, named In-Sample Sequential Policy Optimization (InSPO)
arXiv Detail & Related papers (2024-12-10T16:19:08Z)
Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments [3.0284592792243794]
Bottom Up Network (BUN) treats the collective of multi-agents as a unified entity. Our empirical evaluations across a variety of cooperative multi-agent scenarios, including tasks such as cooperative navigation and traffic control, consistently demonstrate BUN's superiority over baseline methods with substantially reduced computational costs.
arXiv Detail & Related papers (2024-10-03T14:25:02Z)
ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization [11.620274237352026]
offline reinforcement learning (RL) has garnered significant attention for its ability to learn effective policies from pre-collected datasets. MARL presents additional challenges due to the large joint state-action space and the complexity of multi-agent behaviors. We introduce a regularizer in the space of stationary distributions to better handle distributional shift.
arXiv Detail & Related papers (2024-10-02T18:56:10Z)
Coordination Failure in Cooperative Offline MARL [3.623224034411137]
We focus on coordination failure and investigate the role of joint actions in multi-agent policy gradients with offline data. By using two-player games as an analytical tool, we demonstrate a simple yet overlooked failure mode of BRUD-based algorithms. We propose an approach to mitigate such failure, by prioritising samples from the dataset based on joint-action similarity.
arXiv Detail & Related papers (2024-07-01T14:51:29Z)
Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z)
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations. Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z)
MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
MADiff is a diffusion-based multi-agent learning framework. It works as both a decentralized policy and a centralized controller. Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z)
Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets. One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team. We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z)
A Unified Framework for Alternating Offline Model Training and Policy Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning. We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return. With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z)
Fully Decentralized Model-based Policy Optimization for Networked Systems [23.46407780093797]
This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors. In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts.
arXiv Detail & Related papers (2022-07-13T23:52:14Z)
Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation [13.670618752160594]
Deep reinforcement learning (DRL) provides a promising approach for multi-agent cooperation through the interaction of the agents and environments. Traditional DRL solutions suffer from the high dimensions of multiple agents with continuous action space during policy search. We propose a hierarchical reinforcement learning approach with high-level decision-making and low-level individual control for efficient policy search.
arXiv Detail & Related papers (2022-06-25T19:09:29Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. We show that an existing model-based RL algorithm already produces significant gains in the offline setting. We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.