PAC Off-Policy Prediction of Contextual Bandits
- URL: http://arxiv.org/abs/2507.16236v1
- Date: Tue, 22 Jul 2025 05:12:29 GMT
- Title: PAC Off-Policy Prediction of Contextual Bandits
- Authors: Yilong Wan, Yuqiang Li, Xianyi Wu,
- Abstract summary: This paper investigates off-policy evaluation in contextual bandits.<n>It aims to quantify the performance of a target policy using data collected under a different and potentially unknown behavior policy.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates off-policy evaluation in contextual bandits, aiming to quantify the performance of a target policy using data collected under a different and potentially unknown behavior policy. Recently, methods based on conformal prediction have been developed to construct reliable prediction intervals that guarantee marginal coverage in finite samples, making them particularly suited for safety-critical applications. To further achieve coverage conditional on a given offline data set, we propose a novel algorithm that constructs probably approximately correct prediction intervals. Our method builds upon a PAC-valid conformal prediction framework, and we strengthen its theoretical guarantees by establishing PAC-type bounds on coverage. We analyze both finite-sample and asymptotic properties of the proposed method, and compare its empirical performance with existing methods in simulations.
Related papers
- Kandinsky Conformal Prediction: Beyond Class- and Covariate-Conditional Coverage [25.945248419737318]
Conformal prediction is a powerful distribution-free framework for constructing prediction sets with coverage guarantees.<n>We present Kandinsky conformal prediction, a framework that significantly expands the scope of conditional coverage guarantees.<n>Our algorithm unifies and extends existing methods, while achieving a minimax-optimal high-probability conditional coverage bound.
arXiv Detail & Related papers (2025-02-24T15:46:18Z) - Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model.<n>We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z) - Adjusting Regression Models for Conditional Uncertainty Calibration [46.69079637538012]
We propose a novel algorithm to train a regression function to improve the conditional coverage after applying the split conformal prediction procedure.
We establish an upper bound for the miscoverage gap between the conditional coverage and the nominal coverage rate and propose an end-to-end algorithm to control this upper bound.
arXiv Detail & Related papers (2024-09-26T01:55:45Z) - Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution.
Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z) - Robust Conformal Prediction Using Privileged Information [17.886554223172517]
We develop a method to generate prediction sets with a guaranteed coverage rate that is robust to corruptions in the training data.<n>Our approach builds on conformal prediction, a powerful framework to construct prediction sets that are valid under the i.i.d assumption.
arXiv Detail & Related papers (2024-06-08T08:56:47Z) - Probabilistic Conformal Prediction Using Conditional Random Samples [73.26753677005331]
PCP is a predictive inference algorithm that estimates a target variable by a discontinuous predictive set.
It is efficient and compatible with either explicit or implicit conditional generative models.
arXiv Detail & Related papers (2022-06-14T03:58:03Z) - Conformal Off-Policy Prediction in Contextual Bandits [54.67508891852636]
Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
arXiv Detail & Related papers (2022-06-09T10:39:33Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.