Conformal Off-Policy Prediction in Contextual Bandits
- URL: http://arxiv.org/abs/2206.04405v1
- Date: Thu, 9 Jun 2022 10:39:33 GMT
- Title: Conformal Off-Policy Prediction in Contextual Bandits
- Authors: Muhammad Faaiz Taufiq, Jean-Francois Ton, Rob Cornish, Yee Whye Teh,
Arnaud Doucet
- Abstract summary: Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
- Score: 54.67508891852636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most off-policy evaluation methods for contextual bandits have focused on the
expected outcome of a policy, which is estimated via methods that at best
provide only asymptotic guarantees. However, in many applications, the
expectation may not be the best measure of performance as it does not capture
the variability of the outcome. In addition, particularly in safety-critical
settings, stronger guarantees than asymptotic correctness may be required. To
address these limitations, we consider a novel application of conformal
prediction to contextual bandits. Given data collected under a behavioral
policy, we propose \emph{conformal off-policy prediction} (COPP), which can
output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional
assumptions beyond the standard contextual bandit setup, and empirically
demonstrate the utility of COPP compared with existing methods on synthetic and
real-world data.
Related papers
- Conditionally valid Probabilistic Conformal Prediction [57.80927226809277]
We develop a new method for creating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution.
We demonstrate the effectiveness of our approach through extensive simulations, showing that it outperforms existing methods in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z) - A Convex Framework for Confounding Robust Inference [21.918894096307294]
We study policy evaluation of offline contextual bandits subject to unobserved confounders.
We propose a general estimator that provides a sharp lower bound of the policy value using convex programming.
arXiv Detail & Related papers (2023-09-21T19:45:37Z) - Hallucinated Adversarial Control for Conservative Offline Policy
Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance.
We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics.
We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z) - Kernel Conditional Moment Constraints for Confounding Robust Inference [22.816690686310714]
We study policy evaluation of offline contextual bandits subject to unobserved confounders.
We propose a general estimator that provides a sharp lower bound of the policy value.
arXiv Detail & Related papers (2023-02-26T16:44:13Z) - Split Localized Conformal Prediction [20.44976410408424]
We propose a modified non-conformity score by leveraging local approximation of the conditional distribution.
The modified score inherits the spirit of split conformal methods, which is simple and efficient compared with full conformal methods.
arXiv Detail & Related papers (2022-06-27T07:53:38Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z) - Offline Policy Selection under Uncertainty [113.57441913299868]
We consider offline policy selection as learning preferences over a set of policy prospects given a fixed experience dataset.
Access to the full distribution over one's belief of the policy value enables more flexible selection algorithms under a wider range of downstream evaluation metrics.
We show how BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric.
arXiv Detail & Related papers (2020-12-12T23:09:21Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications.
Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment.
In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.