Conformal Off-Policy Prediction in Contextual Bandits
- URL: http://arxiv.org/abs/2206.04405v1
- Date: Thu, 9 Jun 2022 10:39:33 GMT
- Title: Conformal Off-Policy Prediction in Contextual Bandits
- Authors: Muhammad Faaiz Taufiq, Jean-Francois Ton, Rob Cornish, Yee Whye Teh,
Arnaud Doucet
- Abstract summary: Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
- Score: 54.67508891852636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most off-policy evaluation methods for contextual bandits have focused on the
expected outcome of a policy, which is estimated via methods that at best
provide only asymptotic guarantees. However, in many applications, the
expectation may not be the best measure of performance as it does not capture
the variability of the outcome. In addition, particularly in safety-critical
settings, stronger guarantees than asymptotic correctness may be required. To
address these limitations, we consider a novel application of conformal
prediction to contextual bandits. Given data collected under a behavioral
policy, we propose \emph{conformal off-policy prediction} (COPP), which can
output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional
assumptions beyond the standard contextual bandit setup, and empirically
demonstrate the utility of COPP compared with existing methods on synthetic and
real-world data.
Related papers
- Adjusting Regression Models for Conditional Uncertainty Calibration [46.69079637538012]
We propose a novel algorithm to train a regression function to improve the conditional coverage after applying the split conformal prediction procedure.
We establish an upper bound for the miscoverage gap between the conditional coverage and the nominal coverage rate and propose an end-to-end algorithm to control this upper bound.
arXiv Detail & Related papers (2024-09-26T01:55:45Z) - Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution.
Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z) - Hallucinated Adversarial Control for Conservative Offline Policy
Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance.
We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics.
We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z) - Kernel Conditional Moment Constraints for Confounding Robust Inference [22.816690686310714]
We study policy evaluation of offline contextual bandits subject to unobserved confounders.
We propose a general estimator that provides a sharp lower bound of the policy value.
arXiv Detail & Related papers (2023-02-26T16:44:13Z) - Split Localized Conformal Prediction [20.44976410408424]
We propose a modified non-conformity score by leveraging local approximation of the conditional distribution.
The modified score inherits the spirit of split conformal methods, which is simple and efficient compared with full conformal methods.
arXiv Detail & Related papers (2022-06-27T07:53:38Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z) - Offline Policy Selection under Uncertainty [113.57441913299868]
We consider offline policy selection as learning preferences over a set of policy prospects given a fixed experience dataset.
Access to the full distribution over one's belief of the policy value enables more flexible selection algorithms under a wider range of downstream evaluation metrics.
We show how BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric.
arXiv Detail & Related papers (2020-12-12T23:09:21Z) - Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications.
Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment.
In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.