Related papers: A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning

A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2201.01221v1
Date: Mon, 3 Jan 2022 14:51:30 GMT
Title: A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning
Authors: Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Christopher Amato
Abstract summary: We show that state-based critics can introduce bias in the policy estimates, potentially undermining the guarantees of the algorithm. We also show that, even if the state-based critics do not introduce any bias, they can still result in a larger gradient variance, contrary to the common intuition.
Score: 17.36759906285316
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Centralized Training for Decentralized Execution, where training is done in a centralized offline fashion, has become a popular solution paradigm in Multi-Agent Reinforcement Learning. Many such methods take the form of actor-critic with state-based critics, since centralized training allows access to the true system state, which can be useful during training despite not being available at execution time. State-based critics have become a common empirical choice, albeit one which has had limited theoretical justification or analysis. In this paper, we show that state-based critics can introduce bias in the policy gradient estimates, potentially undermining the asymptotic guarantees of the algorithm. We also show that, even if the state-based critics do not introduce any bias, they can still result in a larger gradient variance, contrary to the common intuition. Finally, we show the effects of the theories in practice by comparing different forms of centralized critics on a wide range of common benchmarks, and detail how various environmental properties are related to the effectiveness of different types of critics.

Related papers

Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning [53.9544543607396]
We propose a novel framework that integrates reward rendering with Imitation from Observation (IfO) By instantiating F-distance in different ways, we derive two theoretical analysis and develop a practical algorithm called Accessible State Oriented Policy Regularization (ASOR) ASOR serves as a general add-on module that can be incorporated into various approaches RL, including offline RL and off-policy RL.
arXiv Detail & Related papers (2025-03-10T03:50:20Z)
On Centralized Critics in Multi-Agent Reinforcement Learning [16.361249170514828]
Training for Decentralized Execution has become a popular approach in Multi-Agent Reinforcement Learning. We analyze the effect of using state-based critics in partially observable environments.
arXiv Detail & Related papers (2024-08-26T19:27:06Z)
Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem. We examine the performance of various debiasing methods across multiple tasks. We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z)
Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains. We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue. We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z)
Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework. To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z)
Unbiased Asymmetric Actor-Critic for Partially Observable Reinforcement Learning [17.48572546628464]
Asymmetric actor-critic methods exploit such information by training a history-based policy via a state-based critic. We examine the theory of asymmetric actor-critic methods which use state-based critics, and expose fundamental issues which undermine the validity of a common variant. We propose an unbiased asymmetric actor-critic variant which is able to exploit state information while remaining theoretically sound.
arXiv Detail & Related papers (2021-05-25T05:18:44Z)
Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system. Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model. We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z)
Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning [19.66161324837036]
Training for Decentralized Execution, where agents are trained offline using centralized information but execute in a decentralized manner online, has gained popularity in the multi-agent reinforcement learning community. In particular, actor-critic methods with a centralized critic and decentralized actors are a common instance of this idea. We analyze centralized and decentralized critic approaches, providing a deeper understanding of the implications of critic choice.
arXiv Detail & Related papers (2021-02-08T18:08:11Z)
Learning Value Functions in Deep Policy Gradients using Residual Variance [22.414430270991005]
Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. Traditional actor-critic algorithms do not succeed in fitting the true value function. We provide a new state-value (resp. state-action-value) function approximation that learns the value of the states relative to their mean value.
arXiv Detail & Related papers (2020-10-09T08:57:06Z)
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders. We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z)
Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics [65.51757376525798]
Overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics.
arXiv Detail & Related papers (2020-05-08T19:52:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.