Related papers: Conformal Off-Policy Prediction in Contextual Bandits

Conformal Off-Policy Prediction in Contextual Bandits

URL: http://arxiv.org/abs/2206.04405v1
Date: Thu, 9 Jun 2022 10:39:33 GMT
Title: Conformal Off-Policy Prediction in Contextual Bandits
Authors: Muhammad Faaiz Taufiq, Jean-Francois Ton, Rob Cornish, Yee Whye Teh, Arnaud Doucet
Abstract summary: Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy. We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
Score: 54.67508891852636
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most off-policy evaluation methods for contextual bandits have focused on the expected outcome of a policy, which is estimated via methods that at best provide only asymptotic guarantees. However, in many applications, the expectation may not be the best measure of performance as it does not capture the variability of the outcome. In addition, particularly in safety-critical settings, stronger guarantees than asymptotic correctness may be required. To address these limitations, we consider a novel application of conformal prediction to contextual bandits. Given data collected under a behavioral policy, we propose \emph{conformal off-policy prediction} (COPP), which can output reliable predictive intervals for the outcome under a new target policy. We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup, and empirically demonstrate the utility of COPP compared with existing methods on synthetic and real-world data.

Related papers

PAC Off-Policy Prediction of Contextual Bandits [0.0]
This paper investigates off-policy evaluation in contextual bandits.<n>It aims to quantify the performance of a target policy using data collected under a different and potentially unknown behavior policy.
arXiv Detail & Related papers (2025-07-22T05:12:29Z)
Rectifying Conformity Scores for Better Conditional Coverage [75.73184036344908]
We present a new method for generating confidence sets within the split conformal prediction framework. Our method performs a trainable transformation of any given conformity score to improve conditional coverage while ensuring exact marginal coverage.
arXiv Detail & Related papers (2025-02-22T19:54:14Z)
Epistemic Uncertainty in Conformal Scores: A Unified Approach [2.449909275410288]
Conformal prediction methods create prediction bands with distribution-free guarantees but do not explicitly capture uncertainty. We introduce $texttEPICSCORE$, a model-agnostic approach that enhances any conformal score by explicitly integrating uncertainty. $texttEPICSCORE$ adaptively expands predictive intervals in regions with limited data while maintaining compact intervals where data is abundant.
arXiv Detail & Related papers (2025-02-10T19:42:54Z)
Adjusting Regression Models for Conditional Uncertainty Calibration [46.69079637538012]
We propose a novel algorithm to train a regression function to improve the conditional coverage after applying the split conformal prediction procedure. We establish an upper bound for the miscoverage gap between the conditional coverage and the nominal coverage rate and propose an end-to-end algorithm to control this upper bound.
arXiv Detail & Related papers (2024-09-26T01:55:45Z)
Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution. Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z)
Hallucinated Adversarial Control for Conservative Offline Policy Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance. We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics. We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z)
Kernel Conditional Moment Constraints for Confounding Robust Inference [22.816690686310714]
We study policy evaluation of offline contextual bandits subject to unobserved confounders. We propose a general estimator that provides a sharp lower bound of the policy value.
arXiv Detail & Related papers (2023-02-26T16:44:13Z)
Split Localized Conformal Prediction [20.44976410408424]
We propose a modified non-conformity score by leveraging local approximation of the conditional distribution. The modified score inherits the spirit of split conformal methods, which is simple and efficient compared with full conformal methods.
arXiv Detail & Related papers (2022-06-27T07:53:38Z)
Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking. They can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z)
Offline Policy Selection under Uncertainty [113.57441913299868]
We consider offline policy selection as learning preferences over a set of policy prospects given a fixed experience dataset. Access to the full distribution over one's belief of the policy value enables more flexible selection algorithms under a wider range of downstream evaluation metrics. We show how BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric.
arXiv Detail & Related papers (2020-12-12T23:09:21Z)
Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications. Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.