Controlling Overestimation Bias with Truncated Mixture of Continuous
Distributional Quantile Critics
- URL: http://arxiv.org/abs/2005.04269v1
- Date: Fri, 8 May 2020 19:52:26 GMT
- Title: Controlling Overestimation Bias with Truncated Mixture of Continuous
Distributional Quantile Critics
- Authors: Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, Dmitry Vetrov
- Abstract summary: Overestimation bias is one of the major impediments to accurate off-policy learning.
This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting.
Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics.
- Score: 65.51757376525798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The overestimation bias is one of the major impediments to accurate
off-policy learning. This paper investigates a novel way to alleviate the
overestimation bias in a continuous control setting. Our method---Truncated
Quantile Critics, TQC,---blends three ideas: distributional representation of a
critic, truncation of critics prediction, and ensembling of multiple critics.
Distributional representation and truncation allow for arbitrary granular
overestimation control, while ensembling provides additional score
improvements. TQC outperforms the current state of the art on all environments
from the continuous control benchmark suite, demonstrating 25% improvement on
the most challenging Humanoid environment.
Related papers
- On Centralized Critics in Multi-Agent Reinforcement Learning [16.361249170514828]
Training for Decentralized Execution has become a popular approach in Multi-Agent Reinforcement Learning.
We analyze the effect of using state-based critics in partially observable environments.
arXiv Detail & Related papers (2024-08-26T19:27:06Z) - A Deeper Understanding of State-Based Critics in Multi-Agent
Reinforcement Learning [17.36759906285316]
We show that state-based critics can introduce bias in the policy estimates, potentially undermining the guarantees of the algorithm.
We also show that, even if the state-based critics do not introduce any bias, they can still result in a larger gradient variance, contrary to the common intuition.
arXiv Detail & Related papers (2022-01-03T14:51:30Z) - Assessment of Treatment Effect Estimators for Heavy-Tailed Data [70.72363097550483]
A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance.
We provide a novel cross-validation-like methodology to address this challenge.
We evaluate our methodology across 709 RCTs implemented in the Amazon supply chain.
arXiv Detail & Related papers (2021-12-14T17:53:01Z) - Automating Control of Overestimation Bias for Continuous Reinforcement
Learning [65.63607016094305]
We present a data-driven approach for guiding bias correction.
We demonstrate its effectiveness on the Truncated Quantile Critics -- a state-of-the-art continuous control algorithm.
arXiv Detail & Related papers (2021-10-26T09:27:12Z) - Parameter-Free Deterministic Reduction of the Estimation Bias in
Continuous Control [0.0]
We introduce a parameter-free, novel deep Q-learning variant to reduce this underestimation bias for continuous control.
We test the performance of our improvement on a set of MuJoCo and Box2D continuous control tasks.
arXiv Detail & Related papers (2021-09-24T07:41:07Z) - Estimation Error Correction in Deep Reinforcement Learning for
Deterministic Actor-Critic Methods [0.0]
In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies.
We show that in deep actor-critic methods that aim to overcome the overestimation bias, if the reinforcement signals received by the agent have a high variance, a significant underestimation bias arises.
To minimize the underestimation, we introduce a parameter-free, novel deep Q-learning variant.
arXiv Detail & Related papers (2021-09-22T13:49:35Z) - Unsupervised Learning of Debiased Representations with Pseudo-Attributes [85.5691102676175]
We propose a simple but effective debiasing technique in an unsupervised manner.
We perform clustering on the feature embedding space and identify pseudoattributes by taking advantage of the clustering results.
We then employ a novel cluster-based reweighting scheme for learning debiased representation.
arXiv Detail & Related papers (2021-08-06T05:20:46Z) - Efficient Continuous Control with Double Actors and Regularized Critics [7.072664211491016]
We explore the potential of double actors, which has been neglected for a long time, for better value function estimation in continuous setting.
We build double actors upon single critic and double critics to handle overestimation bias in DDPG and underestimation bias in TD3 respectively.
To mitigate the uncertainty of value estimate from double critics, we propose to regularize the critic networks under double actors architecture.
arXiv Detail & Related papers (2021-06-06T07:04:48Z) - Re-Assessing the "Classify and Count" Quantification Method [88.60021378715636]
"Classify and Count" (CC) is often a biased estimator.
Previous works have failed to use properly optimised versions of CC.
We argue that, while still inferior to some cutting-edge methods, they deliver near-state-of-the-art accuracy.
arXiv Detail & Related papers (2020-11-04T21:47:39Z) - Prediction with Corrupted Expert Advice [67.67399390910381]
We prove that a variant of the classical Multiplicative Weights algorithm with decreasing step sizes achieves constant regret in a benign environment.
Our results reveal a surprising disparity between the often comparable Follow the Regularized Leader (FTRL) and Online Mirror Descent (OMD) frameworks.
arXiv Detail & Related papers (2020-02-24T14:39:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.