Universal Off-Policy Evaluation
- URL: http://arxiv.org/abs/2104.12820v1
- Date: Mon, 26 Apr 2021 18:54:31 GMT
- Title: Universal Off-Policy Evaluation
- Authors: Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik
Learned-Miller, Emma Brunskill, Philip S. Thomas
- Abstract summary: We take the first steps towards a universal off-policy estimator (UnO)
We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns.
- Score: 64.02853483874334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When faced with sequential decision-making problems, it is often useful to be
able to predict what would happen if decisions were made using a new policy.
Those predictions must often be based on data collected under some previously
used decision-making rule. Many previous methods enable such off-policy (or
counterfactual) estimation of the expected value of a performance measure
called the return. In this paper, we take the first steps towards a universal
off-policy estimator (UnO) -- one that provides off-policy estimates and
high-confidence bounds for any parameter of the return distribution. We use UnO
for estimating and simultaneously bounding the mean, variance,
quantiles/median, inter-quantile range, CVaR, and the entire cumulative
distribution of returns. Finally, we also discuss Uno's applicability in
various settings, including fully observable, partially observable (i.e., with
unobserved confounders), Markovian, non-Markovian, stationary, smoothly
non-stationary, and discrete distribution shifts.
Related papers
- Quantile Regression using Random Forest Proximities [0.9423257767158634]
Quantile regression forests estimate the entire conditional distribution of the target variable with a single model.
We show that using quantile regression using Random Forest proximities demonstrates superior performance in approximating conditional target distributions and prediction intervals to the original version of QRF.
arXiv Detail & Related papers (2024-08-05T10:02:33Z) - Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Quantile Off-Policy Evaluation via Deep Conditional Generative Learning [21.448553360543478]
Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy.
We propose a doubly-robust inference procedure for quantile OPE in sequential decision making.
We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform.
arXiv Detail & Related papers (2022-12-29T22:01:43Z) - Variance Penalized On-Policy and Off-Policy Actor-Critic [60.06593931848165]
We propose on-policy and off-policy actor-critic algorithms that optimize a performance criterion involving both mean and variance in the return.
Our approach not only performs on par with actor-critic and prior variance-penalization baselines in terms of expected return, but also generates trajectories which have lower variance in the return.
arXiv Detail & Related papers (2021-02-03T10:06:16Z) - Regression with reject option and application to kNN [0.0]
We refer to this framework as regression with reject option as an extension of classification with reject option.
We provide a semi-supervised estimation procedure of the optimal rule involving two datasets.
The resulting predictor with reject option is shown to be almost as good as the optimal predictor with reject option both in terms of risk and rejection rate.
arXiv Detail & Related papers (2020-06-30T08:20:57Z) - Batch Stationary Distribution Estimation [98.18201132095066]
We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions.
We propose a consistent estimator that is based on recovering a correction ratio function over the given data.
arXiv Detail & Related papers (2020-03-02T09:10:01Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.