On Centralized Critics in Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2408.14597v1
- Date: Mon, 26 Aug 2024 19:27:06 GMT
- Title: On Centralized Critics in Multi-Agent Reinforcement Learning
- Authors: Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Brett Daley, Christopher Amato,
- Abstract summary: Training for Decentralized Execution has become a popular approach in Multi-Agent Reinforcement Learning.
We analyze the effect of using state-based critics in partially observable environments.
- Score: 16.361249170514828
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Centralized Training for Decentralized Execution where agents are trained offline in a centralized fashion and execute online in a decentralized manner, has become a popular approach in Multi-Agent Reinforcement Learning (MARL). In particular, it has become popular to develop actor-critic methods that train decentralized actors with a centralized critic where the centralized critic is allowed access global information of the entire system, including the true system state. Such centralized critics are possible given offline information and are not used for online execution. While these methods perform well in a number of domains and have become a de facto standard in MARL, using a centralized critic in this context has yet to be sufficiently analyzed theoretically or empirically. In this paper, we therefore formally analyze centralized and decentralized critic approaches, and analyze the effect of using state-based critics in partially observable environments. We derive theories contrary to the common intuition: critic centralization is not strictly beneficial, and using state values can be harmful. We further prove that, in particular, state-based critics can introduce unexpected bias and variance compared to history-based critics. Finally, we demonstrate how the theory applies in practice by comparing different forms of critics on a wide range of common multi-agent benchmarks. The experiments show practical issues such as the difficulty of representation learning with partial observability, which highlights why the theoretical problems are often overlooked in the literature.
Related papers
- Generalization Error Matters in Decentralized Learning Under Byzantine Attacks [22.589653582068117]
Decentralized learning has emerged as a popular peer-to-peer signal and information processing paradigm.
We provide the first analysis of the generalization errors for a class of popular Byzantine-resilient decentralized gradient (DSGD) algorithms.
arXiv Detail & Related papers (2024-07-11T16:12:53Z) - Towards Understanding Generalization and Stability Gaps between Centralized and Decentralized Federated Learning [57.35402286842029]
We show that centralized learning always generalizes better than decentralized learning (DFL)
We also conduct experiments on several common setups in FL to validate that our theoretical analysis is consistent with experimental phenomena and contextually valid in several general and practical scenarios.
arXiv Detail & Related papers (2023-10-05T11:09:42Z) - BLEURT Has Universal Translations: An Analysis of Automatic Metrics by
Minimum Risk Training [64.37683359609308]
In this study, we analyze various mainstream and cutting-edge automatic metrics from the perspective of their guidance for training machine translation systems.
We find that certain metrics exhibit robustness defects, such as the presence of universal adversarial translations in BLEURT and BARTScore.
In-depth analysis suggests two main causes of these robustness deficits: distribution biases in the training datasets, and the tendency of the metric paradigm.
arXiv Detail & Related papers (2023-07-06T16:59:30Z) - Networked Communication for Decentralised Agents in Mean-Field Games [59.01527054553122]
We introduce networked communication to the mean-field game framework.
We prove that our architecture has sample guarantees bounded between those of the centralised- and independent-learning cases.
arXiv Detail & Related papers (2023-06-05T10:45:39Z) - Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense
Reasoning [85.1541170468617]
This paper reconsiders the nature of commonsense reasoning and proposes a novel commonsense reasoning metric, Non-Replacement Confidence (NRC)
Our proposed novel method boosts zero-shot performance on two commonsense reasoning benchmark datasets and further seven commonsense question-answering datasets.
arXiv Detail & Related papers (2022-08-23T14:42:14Z) - Communication-Efficient Actor-Critic Methods for Homogeneous Markov
Games [6.589813623221242]
Policy sharing is crucial to efficient learning in certain tasks yet lacks theoretical justification.
We develop the first consensus-based decentralized actor-critic method.
We also develop practical algorithms based on our decentralized actor-critic method to reduce the communication cost during training.
arXiv Detail & Related papers (2022-02-18T20:35:00Z) - A Deeper Understanding of State-Based Critics in Multi-Agent
Reinforcement Learning [17.36759906285316]
We show that state-based critics can introduce bias in the policy estimates, potentially undermining the guarantees of the algorithm.
We also show that, even if the state-based critics do not introduce any bias, they can still result in a larger gradient variance, contrary to the common intuition.
arXiv Detail & Related papers (2022-01-03T14:51:30Z) - Unbiased Asymmetric Actor-Critic for Partially Observable Reinforcement
Learning [17.48572546628464]
Asymmetric actor-critic methods exploit such information by training a history-based policy via a state-based critic.
We examine the theory of asymmetric actor-critic methods which use state-based critics, and expose fundamental issues which undermine the validity of a common variant.
We propose an unbiased asymmetric actor-critic variant which is able to exploit state information while remaining theoretically sound.
arXiv Detail & Related papers (2021-05-25T05:18:44Z) - Consensus Control for Decentralized Deep Learning [72.50487751271069]
Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.
We show in theory that when the training consensus distance is lower than a critical quantity, decentralized training converges as fast as the centralized counterpart.
Our empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop.
arXiv Detail & Related papers (2021-02-09T13:58:33Z) - Contrasting Centralized and Decentralized Critics in Multi-Agent
Reinforcement Learning [19.66161324837036]
Training for Decentralized Execution, where agents are trained offline using centralized information but execute in a decentralized manner online, has gained popularity in the multi-agent reinforcement learning community.
In particular, actor-critic methods with a centralized critic and decentralized actors are a common instance of this idea.
We analyze centralized and decentralized critic approaches, providing a deeper understanding of the implications of critic choice.
arXiv Detail & Related papers (2021-02-08T18:08:11Z) - Controlling Overestimation Bias with Truncated Mixture of Continuous
Distributional Quantile Critics [65.51757376525798]
Overestimation bias is one of the major impediments to accurate off-policy learning.
This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting.
Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics.
arXiv Detail & Related papers (2020-05-08T19:52:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.