A Deeper Understanding of State-Based Critics in Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2201.01221v1
- Date: Mon, 3 Jan 2022 14:51:30 GMT
- Title: A Deeper Understanding of State-Based Critics in Multi-Agent
Reinforcement Learning
- Authors: Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Christopher Amato
- Abstract summary: We show that state-based critics can introduce bias in the policy estimates, potentially undermining the guarantees of the algorithm.
We also show that, even if the state-based critics do not introduce any bias, they can still result in a larger gradient variance, contrary to the common intuition.
- Score: 17.36759906285316
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Centralized Training for Decentralized Execution, where training is done in a
centralized offline fashion, has become a popular solution paradigm in
Multi-Agent Reinforcement Learning. Many such methods take the form of
actor-critic with state-based critics, since centralized training allows access
to the true system state, which can be useful during training despite not being
available at execution time. State-based critics have become a common empirical
choice, albeit one which has had limited theoretical justification or analysis.
In this paper, we show that state-based critics can introduce bias in the
policy gradient estimates, potentially undermining the asymptotic guarantees of
the algorithm. We also show that, even if the state-based critics do not
introduce any bias, they can still result in a larger gradient variance,
contrary to the common intuition. Finally, we show the effects of the theories
in practice by comparing different forms of centralized critics on a wide range
of common benchmarks, and detail how various environmental properties are
related to the effectiveness of different types of critics.
Related papers
- On Centralized Critics in Multi-Agent Reinforcement Learning [16.361249170514828]
Training for Decentralized Execution has become a popular approach in Multi-Agent Reinforcement Learning.
We analyze the effect of using state-based critics in partially observable environments.
arXiv Detail & Related papers (2024-08-26T19:27:06Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains.
We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue.
We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Unbiased Asymmetric Actor-Critic for Partially Observable Reinforcement
Learning [17.48572546628464]
Asymmetric actor-critic methods exploit such information by training a history-based policy via a state-based critic.
We examine the theory of asymmetric actor-critic methods which use state-based critics, and expose fundamental issues which undermine the validity of a common variant.
We propose an unbiased asymmetric actor-critic variant which is able to exploit state information while remaining theoretically sound.
arXiv Detail & Related papers (2021-05-25T05:18:44Z) - Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system.
Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model.
We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z) - Contrasting Centralized and Decentralized Critics in Multi-Agent
Reinforcement Learning [19.66161324837036]
Training for Decentralized Execution, where agents are trained offline using centralized information but execute in a decentralized manner online, has gained popularity in the multi-agent reinforcement learning community.
In particular, actor-critic methods with a centralized critic and decentralized actors are a common instance of this idea.
We analyze centralized and decentralized critic approaches, providing a deeper understanding of the implications of critic choice.
arXiv Detail & Related papers (2021-02-08T18:08:11Z) - Learning Value Functions in Deep Policy Gradients using Residual
Variance [22.414430270991005]
Policy gradient algorithms have proven to be successful in diverse decision making and control tasks.
Traditional actor-critic algorithms do not succeed in fitting the true value function.
We provide a new state-value (resp. state-action-value) function approximation that learns the value of the states relative to their mean value.
arXiv Detail & Related papers (2020-10-09T08:57:06Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z) - Controlling Overestimation Bias with Truncated Mixture of Continuous
Distributional Quantile Critics [65.51757376525798]
Overestimation bias is one of the major impediments to accurate off-policy learning.
This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting.
Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics.
arXiv Detail & Related papers (2020-05-08T19:52:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.