Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error
Feedback
- URL: http://arxiv.org/abs/2306.11918v1
- Date: Tue, 20 Jun 2023 22:06:14 GMT
- Title: Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error
Feedback
- Authors: Hang Wang, Sen Lin, Junshan Zhang
- Abstract summary: The ensemble method is a promising way to mitigate the overestimation issue in Q-learning.
It is known that the estimation bias hinges heavily on the ensemble size.
We devise an ensemble method with two key steps: (a) approximation error characterization which serves as the feedback for flexibly controlling the ensemble size, and (b) ensemble size adaptation tailored towards minimizing the estimation bias.
- Score: 31.115084475673793
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ensemble method is a promising way to mitigate the overestimation issue
in Q-learning, where multiple function approximators are used to estimate the
action values. It is known that the estimation bias hinges heavily on the
ensemble size (i.e., the number of Q-function approximators used in the
target), and that determining the `right' ensemble size is highly nontrivial,
because of the time-varying nature of the function approximation errors during
the learning process. To tackle this challenge, we first derive an upper bound
and a lower bound on the estimation bias, based on which the ensemble size is
adapted to drive the bias to be nearly zero, thereby coping with the impact of
the time-varying approximation errors accordingly. Motivated by the theoretic
findings, we advocate that the ensemble method can be combined with Model
Identification Adaptive Control (MIAC) for effective ensemble size adaptation.
Specifically, we devise Adaptive Ensemble Q-learning (AdaEQ), a generalized
ensemble method with two key steps: (a) approximation error characterization
which serves as the feedback for flexibly controlling the ensemble size, and
(b) ensemble size adaptation tailored towards minimizing the estimation bias.
Extensive experiments are carried out to show that AdaEQ can improve the
learning performance than the existing methods for the MuJoCo benchmark.
Related papers
- MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
Model merging is an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model.
Existing model-merging methods focus on enhancing average task accuracy.
We introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - C-Learner: Constrained Learning for Causal Inference and Semiparametric Statistics [5.395560682099634]
We present a novel correction method that solves for the best plug-in estimator under the constraint that the first-order error of the estimator with respect to the nuisance parameter estimate is zero.
Our semi inference approach, which we call the "C-Learner", can be implemented with modern machine learning methods such as neural networks and tree ensembles.
arXiv Detail & Related papers (2024-05-15T16:38:28Z) - Mind the Gap: Measuring Generalization Performance Across Multiple
Objectives [29.889018459046316]
We present a novel evaluation protocol that allows measuring the generalization performance of MHPO methods.
We also study its capabilities for comparing two optimization experiments.
arXiv Detail & Related papers (2022-12-08T10:53:56Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Distributed Nonparametric Function Estimation: Optimal Rate of
Convergence and Cost of Adaptation [1.332560004325655]
Distributed minimax estimation and distributed adaptive estimation under communication constraints are studied.
We quantify the exact communication cost for adaptation and construct an optimally adaptive procedure for distributed estimation.
arXiv Detail & Related papers (2021-07-01T02:16:16Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Causal Inference Under Unmeasured Confounding With Negative Controls: A
Minimax Learning Approach [84.29777236590674]
We study the estimation of causal parameters when not all confounders are observed and instead negative controls are available.
Recent work has shown how these can enable identification and efficient estimation via two so-called bridge functions.
arXiv Detail & Related papers (2021-03-25T17:59:19Z) - Calibrated Adaptive Probabilistic ODE Solvers [31.442275669185626]
We introduce, discuss, and assess several probabilistically motivated ways to calibrate the uncertainty estimate.
We demonstrate the efficiency of the methodology by benchmarking against the classic, widely used Dormand-Prince 4/5 Runge-Kutta method.
arXiv Detail & Related papers (2020-12-15T10:48:55Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Maxmin Q-learning: Controlling the Estimation Bias of Q-learning [31.742397178618624]
Overestimation bias affects Q-learning because it approximates the maximum action value using the maximum estimated action value.
We propose a generalization of Q-learning, called emphMaxmin Q-learning, which provides a parameter to flexibly control bias.
We empirically verify that our algorithm better controls estimation bias in toy environments, and that it achieves superior performance on several benchmark problems.
arXiv Detail & Related papers (2020-02-16T02:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.