Reducing Variance in Temporal-Difference Value Estimation via Ensemble
of Deep Networks
- URL: http://arxiv.org/abs/2209.07670v1
- Date: Fri, 16 Sep 2022 01:47:36 GMT
- Title: Reducing Variance in Temporal-Difference Value Estimation via Ensemble
of Deep Networks
- Authors: Litian Liang, Yaosheng Xu, Stephen McAleer, Dailin Hu, Alexander
Ihler, Pieter Abbeel, Roy Fox
- Abstract summary: MeanQ is a simple ensemble method that estimates target values as ensemble means.
We show that MeanQ shows remarkable sample efficiency in experiments on the Atari Learning Environment benchmark.
- Score: 109.59988683444986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In temporal-difference reinforcement learning algorithms, variance in value
estimation can cause instability and overestimation of the maximal target
value. Many algorithms have been proposed to reduce overestimation, including
several recent ensemble methods, however none have shown success in
sample-efficient learning through addressing estimation variance as the root
cause of overestimation. In this paper, we propose MeanQ, a simple ensemble
method that estimates target values as ensemble means. Despite its simplicity,
MeanQ shows remarkable sample efficiency in experiments on the Atari Learning
Environment benchmark. Importantly, we find that an ensemble of size 5
sufficiently reduces estimation variance to obviate the lagging target network,
eliminating it as a source of bias and further gaining sample efficiency. We
justify intuitively and empirically the design choices in MeanQ, including the
necessity of independent experience sampling. On a set of 26 benchmark Atari
environments, MeanQ outperforms all tested baselines, including the best
available baseline, SUNRISE, at 100K interaction steps in 16/26 environments,
and by 68% on average. MeanQ also outperforms Rainbow DQN at 500K steps in
21/26 environments, and by 49% on average, and achieves average human-level
performance using 200K ($\pm$100K) interaction steps. Our implementation is
available at https://github.com/indylab/MeanQ.
Related papers
- It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives [40.197673152937256]
Training of statistical performance models often requires vast amounts of data, leading to a significant time investment and can be difficult in case of limited hardware availability.
We propose a novel performance modeling methodology that significantly reduces the number of training samples while maintaining good accuracy.
We achieve a Mean Absolute Percentage Error (MAPE) of as low as 0.02% for single-layer estimations and 0.68% for whole estimations with less than 10000 training samples.
arXiv Detail & Related papers (2024-06-12T15:34:28Z) - Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-Learning [0.6963971634605796]
We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning.
Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble.
arXiv Detail & Related papers (2024-05-14T00:57:02Z) - Theoretical Analysis of Explicit Averaging and Novel Sign Averaging in
Comparison-Based Search [6.883986852278248]
In black-box optimization, noise in the objective function is inevitable.
Explicit averaging is widely used as a simple and versatile noise-handling technique.
Alternatively, sign averaging is proposed as a simple but robust noise-handling technique.
arXiv Detail & Related papers (2024-01-25T08:35:50Z) - Labeling-Free Comparison Testing of Deep Learning Models [28.47632100019289]
We propose a labeling-free comparison testing approach to overcome the limitations of labeling effort and sampling randomness.
Our approach outperforms the baseline methods by up to 0.74 and 0.53 on Spearman's correlation and Kendall's $tau$, regardless of the dataset and distribution shift.
arXiv Detail & Related papers (2022-04-08T10:55:45Z) - Fast and Data Efficient Reinforcement Learning from Pixels via
Non-Parametric Value Approximation [90.78178803486746]
We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments.
We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.
arXiv Detail & Related papers (2022-03-07T00:31:31Z) - Deep Reinforcement Learning at the Edge of the Statistical Precipice [31.178451465925555]
We argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field.
We advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results.
arXiv Detail & Related papers (2021-08-30T14:23:48Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z) - EqCo: Equivalent Rules for Self-supervised Contrastive Learning [81.45848885547754]
We propose a method to make self-supervised learning irrelevant to the number of negative samples in InfoNCE-based contrastive learning frameworks.
Inspired by the InfoMax principle, we point that the margin term in contrastive loss needs to be adaptively scaled according to the number of negative pairs.
arXiv Detail & Related papers (2020-10-05T11:39:04Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.