Communication-Efficient Actor-Critic Methods for Homogeneous Markov
Games
- URL: http://arxiv.org/abs/2202.09422v1
- Date: Fri, 18 Feb 2022 20:35:00 GMT
- Title: Communication-Efficient Actor-Critic Methods for Homogeneous Markov
Games
- Authors: Dingyang Chen, Yile Li, Qi Zhang
- Abstract summary: Policy sharing is crucial to efficient learning in certain tasks yet lacks theoretical justification.
We develop the first consensus-based decentralized actor-critic method.
We also develop practical algorithms based on our decentralized actor-critic method to reduce the communication cost during training.
- Score: 6.589813623221242
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent success in cooperative multi-agent reinforcement learning (MARL)
relies on centralized training and policy sharing. Centralized training
eliminates the issue of non-stationarity MARL yet induces large communication
costs, and policy sharing is empirically crucial to efficient learning in
certain tasks yet lacks theoretical justification. In this paper, we formally
characterize a subclass of cooperative Markov games where agents exhibit a
certain form of homogeneity such that policy sharing provably incurs no
suboptimality. This enables us to develop the first consensus-based
decentralized actor-critic method where the consensus update is applied to both
the actors and the critics while ensuring convergence. We also develop
practical algorithms based on our decentralized actor-critic method to reduce
the communication cost during training, while still yielding policies
comparable with centralized training.
Related papers
- ROMA-iQSS: An Objective Alignment Approach via State-Based Value Learning and ROund-Robin Multi-Agent Scheduling [44.276285521929424]
We introduce a decentralized state-based value learning algorithm that enables agents to independently discover optimal states.
Our theoretical analysis shows that our approach leads decentralized agents to an optimal collective policy.
Empirical experiments further demonstrate that our method outperforms existing decentralized state-based and action-based value learning strategies.
arXiv Detail & Related papers (2024-04-05T09:39:47Z) - Context-Aware Bayesian Network Actor-Critic Methods for Cooperative
Multi-Agent Reinforcement Learning [7.784991832712813]
We introduce a Bayesian network to inaugurate correlations between agents' action selections in their joint policy.
We develop practical algorithms to learn the context-aware Bayesian network policies.
Empirical results on a range of MARL benchmarks show the benefits of our approach.
arXiv Detail & Related papers (2023-06-02T21:22:27Z) - Is Centralized Training with Decentralized Execution Framework
Centralized Enough for MARL? [27.037348104661497]
Training with Decentralized Execution is a popular framework for cooperative Multi-Agent Reinforcement Learning.
We introduce a novel Advising and Decentralized Pruning (CADP) framework for multi-agent reinforcement learning.
arXiv Detail & Related papers (2023-05-27T03:15:24Z) - More Centralized Training, Still Decentralized Execution: Multi-Agent
Conditional Policy Factorization [21.10461189367695]
In cooperative multi-agent reinforcement learning (MARL), combining value decomposition with actor-critic enables agents learn policies.
Agents are commonly assumed to be independent of each other, even in centralized training.
We propose multi-agent conditional policy factorization (MACPF) which takes more centralized training but still enables decentralized execution.
arXiv Detail & Related papers (2022-09-26T13:29:22Z) - Finite-Time Consensus Learning for Decentralized Optimization with
Nonlinear Gossiping [77.53019031244908]
We present a novel decentralized learning framework based on nonlinear gossiping (NGO), that enjoys an appealing finite-time consensus property to achieve better synchronization.
Our analysis on how communication delay and randomized chats affect learning further enables the derivation of practical variants.
arXiv Detail & Related papers (2021-11-04T15:36:25Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Cooperative Multi-Agent Actor-Critic for Privacy-Preserving Load
Scheduling in a Residential Microgrid [71.17179010567123]
We propose a privacy-preserving multi-agent actor-critic framework where the decentralized actors are trained with distributed critics.
The proposed framework can preserve the privacy of the households while simultaneously learn the multi-agent credit assignment mechanism implicitly.
arXiv Detail & Related papers (2021-10-06T14:05:26Z) - Periodic Stochastic Gradient Descent with Momentum for Decentralized
Training [114.36410688552579]
We propose a novel periodic decentralized momentum SGD method, which employs the momentum schema and periodic communication for decentralized training.
We conduct extensive experiments to verify the performance of our proposed two methods, and both of them have shown superior performance over existing methods.
arXiv Detail & Related papers (2020-08-24T13:38:22Z) - Learning Implicit Credit Assignment for Cooperative Multi-Agent
Reinforcement Learning [31.147638213056872]
We present a multi-agent actor-critic method that aims to implicitly address the credit assignment problem under fully cooperative settings.
Our key motivation is that credit assignment among agents may not require an explicit formulation as long as the policy gradients from a centralized critic carry sufficient information for the decentralized agents to maximize their joint action value.
Our algorithm, referred to as LICA, is evaluated on several benchmarks including the multi-agent particle environments and a set of challenging Star II micromanagement tasks.
arXiv Detail & Related papers (2020-07-06T05:25:02Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.