Operator Augmentation for Model-based Policy Evaluation
- URL: http://arxiv.org/abs/2110.12658v1
- Date: Mon, 25 Oct 2021 05:58:49 GMT
- Title: Operator Augmentation for Model-based Policy Evaluation
- Authors: Xun Tang, Lexing Ying, Yuhua Zhu
- Abstract summary: In model-based reinforcement learning, the transition matrix and reward vector are often estimated from random samples subject to noise.
We introduce an operator augmentation method for reducing the error introduced by the estimated model.
- Score: 1.503974529275767
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In model-based reinforcement learning, the transition matrix and reward
vector are often estimated from random samples subject to noise. Even if the
estimated model is an unbiased estimate of the true underlying model, the value
function computed from the estimated model is biased. We introduce an operator
augmentation method for reducing the error introduced by the estimated model.
When the error is in the residual norm, we prove that the augmentation factor
is always positive and upper bounded by $1 + O (1/n)$, where n is the number of
samples used in learning each row of the transition matrix. We also propose a
practical numerical algorithm for implementing the operator augmentation.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Multiply Robust Estimator Circumvents Hyperparameter Tuning of Neural
Network Models in Causal Inference [0.0]
Multiply Robust (MR) estimator allows us to leverage all the first-step models in a single estimator.
We show that MR is the solution to a broad class of estimating equations, and is also consistent if one of the treatment models is $sqrtn$ consistent.
arXiv Detail & Related papers (2023-07-20T02:31:12Z) - Efficient Truncated Linear Regression with Unknown Noise Variance [26.870279729431328]
We provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown.
Our estimator is based on an efficient implementation of Projected Gradient Descent on the negative-likelihood of the truncated sample.
arXiv Detail & Related papers (2022-08-25T12:17:37Z) - Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo
sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model.
We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z) - Performance of Bayesian linear regression in a model with mismatch [8.60118148262922]
We analyze the performance of an estimator given by the mean of a log-concave Bayesian posterior distribution with gaussian prior.
This inference model can be rephrased as a version of the Gardner model in spin glasses.
arXiv Detail & Related papers (2021-07-14T18:50:13Z) - Model-based metrics: Sample-efficient estimates of predictive model
subpopulation performance [11.994417027132807]
Machine learning models $-$ now commonly developed to screen, diagnose, or predict health conditions are evaluated with a variety of performance metrics.
Subpopulation performance metrics are typically computed using only data from that subgroup, resulting in higher variance estimates for smaller groups.
We propose using an evaluation model $-$ a model that describes the conditional distribution of the predictive model score $-$ to form model-based metric (MBM) estimates.
arXiv Detail & Related papers (2021-04-25T19:06:34Z) - Positive-Congruent Training: Towards Regression-Free Model Updates [87.25247195148187]
In image classification, sample-wise inconsistencies appear as "negative flips"
A new model incorrectly predicts the output for a test sample that was correctly classified by the old (reference) model.
We propose a simple approach for PC training, Focal Distillation, which enforces congruence with the reference model.
arXiv Detail & Related papers (2020-11-18T09:00:44Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Low-Rank Matrix Estimation From Rank-One Projections by Unlifted Convex
Optimization [9.492903649862761]
We study an estimator with a formulation convex for recovery of low-rank matrices from rank-one projections.
We show that under both models the estimator succeeds, with high probability, if the number of measurements exceeds $r2 (d+d_$2) up.
arXiv Detail & Related papers (2020-04-06T14:57:54Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.