Related papers: On Centralized Critics in Multi-Agent Reinforcement Learning

On Centralized Critics in Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2408.14597v1
Date: Mon, 26 Aug 2024 19:27:06 GMT
Title: On Centralized Critics in Multi-Agent Reinforcement Learning
Authors: Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Brett Daley, Christopher Amato,
Abstract summary: Training for Decentralized Execution has become a popular approach in Multi-Agent Reinforcement Learning. We analyze the effect of using state-based critics in partially observable environments.
Score: 16.361249170514828
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Centralized Training for Decentralized Execution where agents are trained offline in a centralized fashion and execute online in a decentralized manner, has become a popular approach in Multi-Agent Reinforcement Learning (MARL). In particular, it has become popular to develop actor-critic methods that train decentralized actors with a centralized critic where the centralized critic is allowed access global information of the entire system, including the true system state. Such centralized critics are possible given offline information and are not used for online execution. While these methods perform well in a number of domains and have become a de facto standard in MARL, using a centralized critic in this context has yet to be sufficiently analyzed theoretically or empirically. In this paper, we therefore formally analyze centralized and decentralized critic approaches, and analyze the effect of using state-based critics in partially observable environments. We derive theories contrary to the common intuition: critic centralization is not strictly beneficial, and using state values can be harmful. We further prove that, in particular, state-based critics can introduce unexpected bias and variance compared to history-based critics. Finally, we demonstrate how the theory applies in practice by comparing different forms of critics on a wide range of common multi-agent benchmarks. The experiments show practical issues such as the difficulty of representation learning with partial observability, which highlights why the theoretical problems are often overlooked in the literature.

Related papers

Identifying Aspects in Peer Reviews [61.374437855024844]
We develop a data-driven schema for deriving fine-grained aspects from a corpus of peer reviews. We introduce a dataset of peer reviews augmented with aspects and show how it can be used for community-level review analysis.
arXiv Detail & Related papers (2025-04-09T14:14:42Z)
Generalization Error Matters in Decentralized Learning Under Byzantine Attacks [22.589653582068117]
Decentralized learning has emerged as a popular peer-to-peer signal and information processing paradigm. We provide the first analysis of the generalization errors for a class of popular Byzantine-resilient decentralized gradient (DSGD) algorithms.
arXiv Detail & Related papers (2024-07-11T16:12:53Z)
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of large language models' implicit bias towards certain demographics. Inspired by psychometric principles, we propose three attack approaches, i.e., Disguise, Deception, and Teaching. Our methods can elicit LLMs' inner bias more effectively than competitive baselines.
arXiv Detail & Related papers (2024-06-20T06:42:08Z)
Rethinking Debiasing: Real-World Bias Analysis and Mitigation [17.080528126651977]
We revisit biased distributions in existing benchmarks and real-world datasets. We empirically and theoretically identify key characteristics of real-world biases poorly represented by existing benchmarks. We propose a simple yet effective approach that can be easily applied to existing debiasing methods, named Debias in Destruction (DiD)
arXiv Detail & Related papers (2024-05-24T06:06:41Z)
Towards Understanding Generalization and Stability Gaps between Centralized and Decentralized Federated Learning [57.35402286842029]
We show that centralized learning always generalizes better than decentralized learning (DFL) We also conduct experiments on several common setups in FL to validate that our theoretical analysis is consistent with experimental phenomena and contextually valid in several general and practical scenarios.
arXiv Detail & Related papers (2023-10-05T11:09:42Z)
BLEURT Has Universal Translations: An Analysis of Automatic Metrics by Minimum Risk Training [64.37683359609308]
In this study, we analyze various mainstream and cutting-edge automatic metrics from the perspective of their guidance for training machine translation systems. We find that certain metrics exhibit robustness defects, such as the presence of universal adversarial translations in BLEURT and BARTScore. In-depth analysis suggests two main causes of these robustness deficits: distribution biases in the training datasets, and the tendency of the metric paradigm.
arXiv Detail & Related papers (2023-07-06T16:59:30Z)
Networked Communication for Decentralised Agents in Mean-Field Games [59.01527054553122]
We introduce networked communication to the mean-field game framework. We prove that our architecture has sample guarantees bounded between those of the centralised- and independent-learning cases.
arXiv Detail & Related papers (2023-06-05T10:45:39Z)
Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense Reasoning [85.1541170468617]
This paper reconsiders the nature of commonsense reasoning and proposes a novel commonsense reasoning metric, Non-Replacement Confidence (NRC) Our proposed novel method boosts zero-shot performance on two commonsense reasoning benchmark datasets and further seven commonsense question-answering datasets.
arXiv Detail & Related papers (2022-08-23T14:42:14Z)
Communication-Efficient Actor-Critic Methods for Homogeneous Markov Games [6.589813623221242]
Policy sharing is crucial to efficient learning in certain tasks yet lacks theoretical justification. We develop the first consensus-based decentralized actor-critic method. We also develop practical algorithms based on our decentralized actor-critic method to reduce the communication cost during training.
arXiv Detail & Related papers (2022-02-18T20:35:00Z)
A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning [17.36759906285316]
We show that state-based critics can introduce bias in the policy estimates, potentially undermining the guarantees of the algorithm. We also show that, even if the state-based critics do not introduce any bias, they can still result in a larger gradient variance, contrary to the common intuition.
arXiv Detail & Related papers (2022-01-03T14:51:30Z)
Unbiased Asymmetric Actor-Critic for Partially Observable Reinforcement Learning [17.48572546628464]
Asymmetric actor-critic methods exploit such information by training a history-based policy via a state-based critic. We examine the theory of asymmetric actor-critic methods which use state-based critics, and expose fundamental issues which undermine the validity of a common variant. We propose an unbiased asymmetric actor-critic variant which is able to exploit state information while remaining theoretically sound.
arXiv Detail & Related papers (2021-05-25T05:18:44Z)
Consensus Control for Decentralized Deep Learning [72.50487751271069]
Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters. We show in theory that when the training consensus distance is lower than a critical quantity, decentralized training converges as fast as the centralized counterpart. Our empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop.
arXiv Detail & Related papers (2021-02-09T13:58:33Z)
Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning [19.66161324837036]
Training for Decentralized Execution, where agents are trained offline using centralized information but execute in a decentralized manner online, has gained popularity in the multi-agent reinforcement learning community. In particular, actor-critic methods with a centralized critic and decentralized actors are a common instance of this idea. We analyze centralized and decentralized critic approaches, providing a deeper understanding of the implications of critic choice.
arXiv Detail & Related papers (2021-02-08T18:08:11Z)
Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics [65.51757376525798]
Overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics.
arXiv Detail & Related papers (2020-05-08T19:52:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.