Off-Policy Evaluation via Adaptive Weighting with Data from Contextual
Bandits
- URL: http://arxiv.org/abs/2106.02029v1
- Date: Thu, 3 Jun 2021 17:54:44 GMT
- Title: Off-Policy Evaluation via Adaptive Weighting with Data from Contextual
Bandits
- Authors: Ruohan Zhan, Vitor Hadad, David A. Hirshberg, and Susan Athey
- Abstract summary: We improve the doubly robust (DR) estimator by adaptively weighting observations to control its variance.
We provide empirical evidence for our estimator's improved accuracy and inferential properties relative to existing alternatives.
- Score: 5.144809478361604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has become increasingly common for data to be collected adaptively, for
example using contextual bandits. Historical data of this type can be used to
evaluate other treatment assignment policies to guide future innovation or
experiments. However, policy evaluation is challenging if the target policy
differs from the one used to collect data, and popular estimators, including
doubly robust (DR) estimators, can be plagued by bias, excessive variance, or
both. In particular, when the pattern of treatment assignment in the collected
data looks little like the pattern generated by the policy to be evaluated, the
importance weights used in DR estimators explode, leading to excessive
variance.
In this paper, we improve the DR estimator by adaptively weighting
observations to control its variance. We show that a t-statistic based on our
improved estimator is asymptotically normal under certain conditions, allowing
us to form confidence intervals and test hypotheses. Using synthetic data and
public benchmarks, we provide empirical evidence for our estimator's improved
accuracy and inferential properties relative to existing alternatives.
Related papers
- Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning.
Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z) - Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old
Data in Nonstationary Environments [31.492146288630515]
We introduce a variant of the doubly robust (DR) estimator, called the regression-assisted DR estimator, that can incorporate the past data without introducing a large bias.
We empirically show that the new estimator improves estimation for the current and future policy values, and provides a tight and valid interval estimation in several nonstationary recommendation environments.
arXiv Detail & Related papers (2023-02-23T01:17:21Z) - Quantile Off-Policy Evaluation via Deep Conditional Generative Learning [21.448553360543478]
Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy.
We propose a doubly-robust inference procedure for quantile OPE in sequential decision making.
We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform.
arXiv Detail & Related papers (2022-12-29T22:01:43Z) - Control Variates for Slate Off-Policy Evaluation [112.35528337130118]
We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions.
We obtain new estimators with risk improvement guarantees over both the PI and self-normalized PI estimators.
arXiv Detail & Related papers (2021-06-15T06:59:53Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z) - Evaluating Model Robustness and Stability to Dataset Shift [7.369475193451259]
We propose a framework for analyzing stability of machine learning models.
We use the original evaluation data to determine distributions under which the algorithm performs poorly.
We estimate the algorithm's performance on the "worst-case" distribution.
arXiv Detail & Related papers (2020-10-28T17:35:39Z) - The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive
Experiments and a Paradox Concerning Logging Policy [13.772109618082382]
We propose a doubly robust (DR) estimator for dependent samples obtained from adaptive experiments.
We report an empirical paradox that our proposed DR estimator tends to show better performances compared to other estimators.
arXiv Detail & Related papers (2020-10-08T06:42:48Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.