MALib: A Parallel Framework for Population-based Multi-agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2106.07551v1
- Date: Sat, 5 Jun 2021 03:27:08 GMT
- Title: MALib: A Parallel Framework for Population-based Multi-agent
Reinforcement Learning
- Authors: Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen,
Yaodong Yang, Weinan Zhang, Jun Wang
- Abstract summary: Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms.
We present MALib, a scalable and efficient computing framework for PB-MARL.
- Score: 61.28547338576706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Population-based multi-agent reinforcement learning (PB-MARL) refers to the
series of methods nested with reinforcement learning (RL) algorithms, which
produces a self-generated sequence of tasks arising from the coupled population
dynamics. By leveraging auto-curricula to induce a population of distinct
emergent strategies, PB-MARL has achieved impressive success in tackling
multi-agent tasks. Despite remarkable prior arts of distributed RL frameworks,
PB-MARL poses new challenges for parallelizing the training frameworks due to
the additional complexity of multiple nested workloads between sampling,
training and evaluation involved with heterogeneous policy interactions. To
solve these problems, we present MALib, a scalable and efficient computing
framework for PB-MARL. Our framework is comprised of three key components: (1)
a centralized task dispatching model, which supports the self-generated tasks
and scalable training with heterogeneous policy combinations; (2) a programming
architecture named Actor-Evaluator-Learner, which achieves high parallelism for
both training and sampling, and meets the evaluation requirement of
auto-curriculum learning; (3) a higher-level abstraction of MARL training
paradigms, which enables efficient code reuse and flexible deployments on
different distributed computing paradigms. Experiments on a series of complex
tasks such as multi-agent Atari Games show that MALib achieves throughput
higher than 40K FPS on a single machine with $32$ CPU cores; 5x speedup than
RLlib and at least 3x speedup than OpenSpiel in multi-agent training tasks.
MALib is publicly available at https://github.com/sjtu-marl/malib.
Related papers
- O-MAPL: Offline Multi-agent Preference Learning [5.4482836906033585]
Inferring reward functions from demonstrations is a key challenge in reinforcement learning (RL)
We introduce a novel end-to-end preference-based learning framework for cooperative MARL.
Our algorithm outperforms existing methods across various tasks.
arXiv Detail & Related papers (2025-01-31T08:08:20Z) - Multi-task Representation Learning for Mixed Integer Linear Programming [13.106799330951842]
This paper introduces the first multi-task learning framework for ML-guided MILP solving.
We demonstrate that our multi-task learning model performs similarly to specialized models within the same distribution.
It significantly outperforms them in generalization across problem sizes and tasks.
arXiv Detail & Related papers (2024-12-18T23:33:32Z) - MALT: Improving Reasoning with Multi-Agent LLM Training [64.13803241218886]
We present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems.
Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles.
We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
MADiff is a diffusion-based multi-agent learning framework.
It works as both a decentralized policy and a centralized controller.
Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning
Library [82.77446613763809]
We present MARLlib, a library designed to offer fast development for multi-agent tasks and algorithm combinations.
MARLlib can effectively disentangle the intertwined nature of the multi-agent task and the learning process of the algorithm.
The library's source code is publicly accessible on GitHub.
arXiv Detail & Related papers (2022-10-11T03:11:12Z) - Containerized Distributed Value-Based Multi-Agent Reinforcement Learning [18.79371121484969]
We propose a containerized multi-agent reinforcement learning framework.
To own knowledge, our method is the first to solve the challenging Google Research Football full game $5_v_5$.
On the StarCraft II micromanagement benchmark, our method gets $4$-$18times$ better results compared to state-of-the-art non-distributed MARL algorithms.
arXiv Detail & Related papers (2021-10-15T15:54:06Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.