Post-Contextual-Bandit Inference
- URL: http://arxiv.org/abs/2106.00418v1
- Date: Tue, 1 Jun 2021 12:01:51 GMT
- Title: Post-Contextual-Bandit Inference
- Authors: Aur\'elien Bibaut and Antoine Chambaz and Maria Dimakopoulou and
Nathan Kallus and Mark van der Laan
- Abstract summary: Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
- Score: 57.88785630755165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contextual bandit algorithms are increasingly replacing non-adaptive A/B
tests in e-commerce, healthcare, and policymaking because they can both improve
outcomes for study participants and increase the chance of identifying good or
even best policies. To support credible inference on novel interventions at the
end of the study, nonetheless, we still want to construct valid confidence
intervals on average treatment effects, subgroup effects, or value of new
policies. The adaptive nature of the data collected by contextual bandit
algorithms, however, makes this difficult: standard estimators are no longer
asymptotically normally distributed and classic confidence intervals fail to
provide correct coverage. While this has been addressed in non-contextual
settings by using stabilized estimators, the contextual setting poses unique
challenges that we tackle for the first time in this paper. We propose the
Contextual Adaptive Doubly Robust (CADR) estimator, the first estimator for
policy value that is asymptotically normal under contextual adaptive data
collection. The main technical challenge in constructing CADR is designing
adaptive and consistent conditional standard deviation estimators for
stabilization. Extensive numerical experiments using 57 OpenML datasets
demonstrate that confidence intervals based on CADR uniquely provide correct
coverage.
Related papers
- Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Adaptive Conformal Prediction by Reweighting Nonconformity Score [0.0]
We use a Quantile Regression Forest (QRF) to learn the distribution of nonconformity scores and utilize the QRF's weights to assign more importance to samples with residuals similar to the test point.
Our approach enjoys an assumption-free finite sample marginal and training-conditional coverage, and under suitable assumptions, it also ensures conditional coverage.
arXiv Detail & Related papers (2023-03-22T16:42:19Z) - Post Reinforcement Learning Inference [22.117487428829488]
We consider estimation and inference using data collected from reinforcement learning algorithms.
We propose a weighted Z-estimation approach with carefully designed adaptive weights to stabilize the time-varying variance.
Primary applications include dynamic treatment effect estimation and dynamic off-policy evaluation.
arXiv Detail & Related papers (2023-02-17T12:53:15Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Conformal Off-Policy Prediction in Contextual Bandits [54.67508891852636]
Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
arXiv Detail & Related papers (2022-06-09T10:39:33Z) - Off-Policy Evaluation via Adaptive Weighting with Data from Contextual
Bandits [5.144809478361604]
We improve the doubly robust (DR) estimator by adaptively weighting observations to control its variance.
We provide empirical evidence for our estimator's improved accuracy and inferential properties relative to existing alternatives.
arXiv Detail & Related papers (2021-06-03T17:54:44Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Accountable Off-Policy Evaluation With Kernel Bellman Statistics [29.14119984573459]
We consider off-policy evaluation (OPE), which evaluates the performance of a new policy from observed data collected from previous experiments.
Due to the limited information from off-policy data, it is highly desirable to construct rigorous confidence intervals, not just point estimation.
We propose a new variational framework which reduces the problem of calculating tight confidence bounds in OPE.
arXiv Detail & Related papers (2020-08-15T07:24:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.