PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2403.02635v1
- Date: Tue, 5 Mar 2024 03:59:01 GMT
- Title: PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of
Multi-Agent Reinforcement Learning
- Authors: Ke Zhang, DanDan Zhu, Qiuhan Xu, Hao Zhou and Ce Zheng
- Abstract summary: Training for multi-agent reinforcement learning(MARL) is a time-consuming process.
One drawback is that strategy of each agent in MARL is independent but actually in cooperation.
We propose three simple approaches called Average Sharing(A-PPS), Reward-Scalability Periodically and Partial Personalized Periodically.
- Score: 20.746383793882984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training for multi-agent reinforcement learning(MARL) is a time-consuming
process caused by distribution shift of each agent. One drawback is that
strategy of each agent in MARL is independent but actually in cooperation.
Thus, a vertical issue in multi-agent reinforcement learning is how to
efficiently accelerate training process. To address this problem, current
research has leveraged a centralized function(CF) across multiple agents to
learn contribution of the team reward for each agent. However, CF based methods
introduce joint error from other agents in estimation of value network. In so
doing, inspired by federated learning, we propose three simple novel approaches
called Average Periodically Parameter Sharing(A-PPS), Reward-Scalability
Periodically Parameter Sharing(RS-PPS) and Partial Personalized Periodically
Parameter Sharing(PP-PPS) mechanism to accelerate training of MARL. Agents
share Q-value network periodically during the training process. Agents which
has same identity adapt collected reward as scalability and update partial
neural network during period to share different parameters. We apply our
approaches in classical MARL method QMIX and evaluate our approaches on various
tasks in StarCraft Multi-Agent Challenge(SMAC) environment. Performance of
numerical experiments yield enormous enhancement, with an average improvement
of 10\%-30\%, and enable to win tasks that QMIX cannot. Our code can be
downloaded from https://github.com/ColaZhang22/PPS-QMIX
Related papers
- MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
Diffusion model (DM) recently achieved huge success in various scenarios including offline reinforcement learning.
We propose MADiff, a novel generative multi-agent learning framework to tackle this problem.
Our experiments show the superior performance of MADiff compared to baseline algorithms in a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning [90.43925357575543]
We propose ranked policy memory ( RPM) to collect diverse multi-agent trajectories for training MARL policies with good generalizability.
RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks, and it significantly boosts the performance up to 402% on average.
arXiv Detail & Related papers (2022-10-18T07:32:43Z) - MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization [17.825845543579195]
We propose a new multi-agent actor-critic method called textitMulti-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO)
We use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer.
We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces.
arXiv Detail & Related papers (2021-09-02T12:43:35Z) - MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition.
The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z) - Distributed Heuristic Multi-Agent Path Finding with Communication [7.854890646114447]
Multi-Agent Path Finding (MAPF) is essential to large-scale robotic systems.
Recent methods have applied reinforcement learning (RL) to learn decentralized polices in partially observable environments.
This paper combines communication with deep Q-learning to provide a novel learning based method for MAPF.
arXiv Detail & Related papers (2021-06-21T18:50:58Z) - MALib: A Parallel Framework for Population-based Multi-agent
Reinforcement Learning [61.28547338576706]
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms.
We present MALib, a scalable and efficient computing framework for PB-MARL.
arXiv Detail & Related papers (2021-06-05T03:27:08Z) - Is Independent Learning All You Need in the StarCraft Multi-Agent
Challenge? [100.48692829396778]
Independent PPO (IPPO) is a form of independent learning in which each agent simply estimates its local value function.
IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
arXiv Detail & Related papers (2020-11-18T20:29:59Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.