Related papers: Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback

Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback

URL: http://arxiv.org/abs/2306.11918v1
Date: Tue, 20 Jun 2023 22:06:14 GMT
Title: Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
Authors: Hang Wang, Sen Lin, Junshan Zhang
Abstract summary: The ensemble method is a promising way to mitigate the overestimation issue in Q-learning. It is known that the estimation bias hinges heavily on the ensemble size. We devise an ensemble method with two key steps: (a) approximation error characterization which serves as the feedback for flexibly controlling the ensemble size, and (b) ensemble size adaptation tailored towards minimizing the estimation bias.
Score: 31.115084475673793
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ensemble method is a promising way to mitigate the overestimation issue in Q-learning, where multiple function approximators are used to estimate the action values. It is known that the estimation bias hinges heavily on the ensemble size (i.e., the number of Q-function approximators used in the target), and that determining the `right' ensemble size is highly nontrivial, because of the time-varying nature of the function approximation errors during the learning process. To tackle this challenge, we first derive an upper bound and a lower bound on the estimation bias, based on which the ensemble size is adapted to drive the bias to be nearly zero, thereby coping with the impact of the time-varying approximation errors accordingly. Motivated by the theoretic findings, we advocate that the ensemble method can be combined with Model Identification Adaptive Control (MIAC) for effective ensemble size adaptation. Specifically, we devise Adaptive Ensemble Q-learning (AdaEQ), a generalized ensemble method with two key steps: (a) approximation error characterization which serves as the feedback for flexibly controlling the ensemble size, and (b) ensemble size adaptation tailored towards minimizing the estimation bias. Extensive experiments are carried out to show that AdaEQ can improve the learning performance than the existing methods for the MuJoCo benchmark.

Related papers

Q-function Decomposition with Intervention Semantics with Factored Action Spaces [51.01244229483353]
We consider Q-functions defined over a lower dimensional projected subspace of the original action space, and study the condition for the unbiasedness of decomposed Q-functions. This leads to a general scheme which we call action decomposed reinforcement learning that uses the projected Q-functions to approximate the Q-function in standard model-free reinforcement learning algorithms.
arXiv Detail & Related papers (2025-04-30T05:26:51Z)
Off-policy estimation with adaptively collected data: the power of online learning [20.023469636707635]
We consider estimation of a linear functional of the treatment effect using adaptively collected data. We propose a general reduction scheme that allows one to produce a sequence of estimates for the treatment effect via online learning.
arXiv Detail & Related papers (2024-11-19T10:18:27Z)
Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores. We flexibly estimate the joint cumulative distribution function (CDF) of the scores. Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z)
C-Learner: Constrained Learning for Causal Inference and Semiparametric Statistics [5.395560682099634]
We propose a novel debiased estimator that achieves stable plug-in estimates with desirable properties. Our constrained learning framework solves for the best plug-in estimator under the constraint that the first-order error with respect to the plugged-in quantity is zero. Our estimator outperforms one-step estimation and targeting in challenging settings with limited overlap between treatment and control, and performs comparably otherwise.
arXiv Detail & Related papers (2024-05-15T16:38:28Z)
Mind the Gap: Measuring Generalization Performance Across Multiple Objectives [29.889018459046316]
We present a novel evaluation protocol that allows measuring the generalization performance of MHPO methods. We also study its capabilities for comparing two optimization experiments.
arXiv Detail & Related papers (2022-12-08T10:53:56Z)
Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier. Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable. We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach [84.29777236590674]
We study the estimation of causal parameters when not all confounders are observed and instead negative controls are available. Recent work has shown how these can enable identification and efficient estimation via two so-called bridge functions.
arXiv Detail & Related papers (2021-03-25T17:59:19Z)
Calibrated Adaptive Probabilistic ODE Solvers [31.442275669185626]
We introduce, discuss, and assess several probabilistically motivated ways to calibrate the uncertainty estimate. We demonstrate the efficiency of the methodology by benchmarking against the classic, widely used Dormand-Prince 4/5 Runge-Kutta method.
arXiv Detail & Related papers (2020-12-15T10:48:55Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
Machine learning for causal inference: on the use of cross-fit estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties. We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE) When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
Maxmin Q-learning: Controlling the Estimation Bias of Q-learning [31.742397178618624]
Overestimation bias affects Q-learning because it approximates the maximum action value using the maximum estimated action value. We propose a generalization of Q-learning, called emphMaxmin Q-learning, which provides a parameter to flexibly control bias. We empirically verify that our algorithm better controls estimation bias in toy environments, and that it achieves superior performance on several benchmark problems.
arXiv Detail & Related papers (2020-02-16T02:02:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.