Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error
Feedback
- URL: http://arxiv.org/abs/2306.11918v1
- Date: Tue, 20 Jun 2023 22:06:14 GMT
- Title: Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error
Feedback
- Authors: Hang Wang, Sen Lin, Junshan Zhang
- Abstract summary: The ensemble method is a promising way to mitigate the overestimation issue in Q-learning.
It is known that the estimation bias hinges heavily on the ensemble size.
We devise an ensemble method with two key steps: (a) approximation error characterization which serves as the feedback for flexibly controlling the ensemble size, and (b) ensemble size adaptation tailored towards minimizing the estimation bias.
- Score: 31.115084475673793
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ensemble method is a promising way to mitigate the overestimation issue
in Q-learning, where multiple function approximators are used to estimate the
action values. It is known that the estimation bias hinges heavily on the
ensemble size (i.e., the number of Q-function approximators used in the
target), and that determining the `right' ensemble size is highly nontrivial,
because of the time-varying nature of the function approximation errors during
the learning process. To tackle this challenge, we first derive an upper bound
and a lower bound on the estimation bias, based on which the ensemble size is
adapted to drive the bias to be nearly zero, thereby coping with the impact of
the time-varying approximation errors accordingly. Motivated by the theoretic
findings, we advocate that the ensemble method can be combined with Model
Identification Adaptive Control (MIAC) for effective ensemble size adaptation.
Specifically, we devise Adaptive Ensemble Q-learning (AdaEQ), a generalized
ensemble method with two key steps: (a) approximation error characterization
which serves as the feedback for flexibly controlling the ensemble size, and
(b) ensemble size adaptation tailored towards minimizing the estimation bias.
Extensive experiments are carried out to show that AdaEQ can improve the
learning performance than the existing methods for the MuJoCo benchmark.
Related papers
- Off-policy estimation with adaptively collected data: the power of online learning [20.023469636707635]
We consider estimation of a linear functional of the treatment effect using adaptively collected data.
We propose a general reduction scheme that allows one to produce a sequence of estimates for the treatment effect via online learning.
arXiv Detail & Related papers (2024-11-19T10:18:27Z) - C-Learner: Constrained Learning for Causal Inference and Semiparametric Statistics [5.395560682099634]
We propose a novel debiased estimator that achieves stable plug-in estimates with desirable properties.
Our constrained learning framework solves for the best plug-in estimator under the constraint that the first-order error with respect to the plugged-in quantity is zero.
Our estimator outperforms one-step estimation and targeting in challenging settings with limited overlap between treatment and control, and performs comparably otherwise.
arXiv Detail & Related papers (2024-05-15T16:38:28Z) - Mind the Gap: Measuring Generalization Performance Across Multiple
Objectives [29.889018459046316]
We present a novel evaluation protocol that allows measuring the generalization performance of MHPO methods.
We also study its capabilities for comparing two optimization experiments.
arXiv Detail & Related papers (2022-12-08T10:53:56Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Causal Inference Under Unmeasured Confounding With Negative Controls: A
Minimax Learning Approach [84.29777236590674]
We study the estimation of causal parameters when not all confounders are observed and instead negative controls are available.
Recent work has shown how these can enable identification and efficient estimation via two so-called bridge functions.
arXiv Detail & Related papers (2021-03-25T17:59:19Z) - Calibrated Adaptive Probabilistic ODE Solvers [31.442275669185626]
We introduce, discuss, and assess several probabilistically motivated ways to calibrate the uncertainty estimate.
We demonstrate the efficiency of the methodology by benchmarking against the classic, widely used Dormand-Prince 4/5 Runge-Kutta method.
arXiv Detail & Related papers (2020-12-15T10:48:55Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Maxmin Q-learning: Controlling the Estimation Bias of Q-learning [31.742397178618624]
Overestimation bias affects Q-learning because it approximates the maximum action value using the maximum estimated action value.
We propose a generalization of Q-learning, called emphMaxmin Q-learning, which provides a parameter to flexibly control bias.
We empirically verify that our algorithm better controls estimation bias in toy environments, and that it achieves superior performance on several benchmark problems.
arXiv Detail & Related papers (2020-02-16T02:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.