Containerized Distributed Value-Based Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2110.08169v1
- Date: Fri, 15 Oct 2021 15:54:06 GMT
- Title: Containerized Distributed Value-Based Multi-Agent Reinforcement Learning
- Authors: Siyang Wu, Tonghan Wang, Chenghao Li, Chongjie Zhang
- Abstract summary: We propose a containerized multi-agent reinforcement learning framework.
To own knowledge, our method is the first to solve the challenging Google Research Football full game $5_v_5$.
On the StarCraft II micromanagement benchmark, our method gets $4$-$18times$ better results compared to state-of-the-art non-distributed MARL algorithms.
- Score: 18.79371121484969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent reinforcement learning tasks put a high demand on the volume of
training samples. Different from its single-agent counterpart, distributed
value-based multi-agent reinforcement learning faces the unique challenges of
demanding data transfer, inter-process communication management, and high
requirement of exploration. We propose a containerized learning framework to
solve these problems. We pack several environment instances, a local learner
and buffer, and a carefully designed multi-queue manager which avoids blocking
into a container. Local policies of each container are encouraged to be as
diverse as possible, and only trajectories with highest priority are sent to a
global learner. In this way, we achieve a scalable, time-efficient, and diverse
distributed MARL learning framework with high system throughput. To own
knowledge, our method is the first to solve the challenging Google Research
Football full game $5\_v\_5$. On the StarCraft II micromanagement benchmark,
our method gets $4$-$18\times$ better results compared to state-of-the-art
non-distributed MARL algorithms.
Related papers
- Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training [15.462969044840868]
We introduce LW-FedMML, a layer-wise federated multimodal learning approach which decomposes the training process into multiple stages.
We conduct extensive experiments across various FL and multimodal learning settings to validate the effectiveness of our proposed method.
Specifically, LW-FedMML reduces memory usage by up to $2.7times$, computational operations (FLOPs) by $2.4times$, and total communication cost by $2.3times$.
arXiv Detail & Related papers (2024-07-22T07:06:17Z) - Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation
Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems.
We introduce a novel multi-agent IL algorithm designed to address these challenges.
Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Scalable Multi-Agent Reinforcement Learning through Intelligent
Information Aggregation [6.09506921406322]
We propose a novel architecture for multi-agent reinforcement learning (MARL) which uses local information intelligently to compute paths for all the agents in a decentralized manner.
InforMARL aggregates information about the local neighborhood of agents for both the actor and the critic using a graph neural network and can be used in conjunction with any standard MARL algorithm.
arXiv Detail & Related papers (2022-11-03T20:02:45Z) - Bidirectional Contrastive Split Learning for Visual Question Answering [6.135215040323833]
Visual Question Answering (VQA) based on multi-modal data facilitates real-life applications such as home robots and medical diagnoses.
One challenge is to devise a robust decentralized learning framework for various client models.
We propose Bidirectional Contrastive Split Learning (BiCSL) to train a global multi-modal model on the entire data distribution of decentralized clients.
arXiv Detail & Related papers (2022-08-24T11:01:47Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - MALib: A Parallel Framework for Population-based Multi-agent
Reinforcement Learning [61.28547338576706]
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms.
We present MALib, a scalable and efficient computing framework for PB-MARL.
arXiv Detail & Related papers (2021-06-05T03:27:08Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.