Statistical Inference for Misspecified Contextual Bandits
- URL: http://arxiv.org/abs/2509.06287v2
- Date: Sat, 20 Sep 2025 03:49:26 GMT
- Title: Statistical Inference for Misspecified Contextual Bandits
- Authors: Yongyi Guo, Ziping Xu,
- Abstract summary: Contextual bandit algorithms have transformed modern experimentation by enabling real-time adaptation for personalized treatment.<n>Yet these advantages create challenges for statistical inference due to adaptivity.<n> Convergence ensures replicability of adaptive experiments and stability of online algorithms.
- Score: 6.178061357164435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contextual bandit algorithms have transformed modern experimentation by enabling real-time adaptation for personalized treatment and efficient use of data. Yet these advantages create challenges for statistical inference due to adaptivity. A fundamental property that supports valid inference is policy convergence, meaning that action-selection probabilities converge in probability given the context. Convergence ensures replicability of adaptive experiments and stability of online algorithms. In this paper, we highlight a previously overlooked issue: widely used algorithms such as LinUCB may fail to converge when the reward model is misspecified, and such non-convergence creates fundamental obstacles for statistical inference. This issue is practically important, as misspecified models -- such as linear approximations of complex dynamic system -- are often employed in real-world adaptive experiments to balance bias and variance. Motivated by this insight, we propose and analyze a broad class of algorithms that are guaranteed to converge even under model misspecification. Building on this guarantee, we develop a general inference framework based on an inverse-probability-weighted Z-estimator (IPW-Z) and establish its asymptotic normality with a consistent variance estimator. Simulation studies confirm that the proposed method provides robust and data-efficient confidence intervals, and can outperform existing approaches that exist only in the special case of offline policy evaluation. Taken together, our results underscore the importance of designing adaptive algorithms with built-in convergence guarantees to enable stable experimentation and valid statistical inference in practice.
Related papers
- Towards regularized learning from functional data with covariate shift [3.072411352294816]
This paper investigates a general regularization framework for unsupervised domain adaptation in vector-valued regression.<n>By restricting the hypothesis space, we develop a practical operator learning algorithm capable of handling functional outputs.
arXiv Detail & Related papers (2026-01-28T20:30:05Z) - Avoiding the Price of Adaptivity: Inference in Linear Contextual Bandits via Stability [2.5782420501870296]
We argue that stability and statistical efficiency can coexist within a single contextual bandit method.<n>We show that our algorithm achieves regret guarantees that are minimax optimal up to logarithmic factors.
arXiv Detail & Related papers (2025-12-23T13:53:53Z) - Conformal and kNN Predictive Uncertainty Quantification Algorithms in Metric Spaces [3.637162892228131]
We develop a conformal prediction algorithm that offers finite-sample coverage guarantees and fast convergence rates of the oracle estimator.<n>In heteroscedastic settings, we forgo these non-asymptotic guarantees to gain statistical efficiency.<n>We demonstrate the practical utility of our approach in personalized--medicine applications involving random response objects.
arXiv Detail & Related papers (2025-07-21T15:54:13Z) - Efficient Adaptive Experimentation with Non-Compliance [39.43227019824619]
We study the problem of estimating the average treatment effect (ATE) in adaptive experiments where treatment can only be encouraged--rather than directly assigned--via a binary instrumental variable.<n>We introduce AMRIV, an online policy that adaptively approximates the optimal allocation with (ii) a sequential, influence-function-based estimator that attains the semi-parametric efficiency bound while retaining multiplyrobust consistency.
arXiv Detail & Related papers (2025-05-23T04:49:14Z) - Rectifying Conformity Scores for Better Conditional Coverage [75.73184036344908]
We present a new method for generating confidence sets within the split conformal prediction framework.<n>Our method performs a trainable transformation of any given conformity score to improve conditional coverage while ensuring exact marginal coverage.
arXiv Detail & Related papers (2025-02-22T19:54:14Z) - Noise-Adaptive Conformal Classification with Marginal Coverage [53.74125453366155]
We introduce an adaptive conformal inference method capable of efficiently handling deviations from exchangeability caused by random label noise.<n>We validate our method through extensive numerical experiments demonstrating its effectiveness on synthetic and real data sets.
arXiv Detail & Related papers (2025-01-29T23:55:23Z) - Distribution-Free Calibration of Statistical Confidence Sets [2.283561089098417]
We introduce two novel methods, TRUST and TRUST++, for calibrating confidence sets to achieve distribution-free conditional coverage.<n>We demonstrate that our methods outperform existing approaches, particularly in small-sample regimes.
arXiv Detail & Related papers (2024-11-28T20:45:59Z) - Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
We investigate the statistical properties of Temporal Difference learning with Polyak-Ruppert averaging.<n>We make three significant contributions that improve the current state-of-the-art results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution.
Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Statistical optimality and stability of tangent transform algorithms in
logit models [6.9827388859232045]
We provide conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the logistical optima.
In particular, we establish local variation of the algorithm without any assumptions on the data-generating process.
We explore a special case involving a semi-orthogonal design under which a global convergence is obtained.
arXiv Detail & Related papers (2020-10-25T05:15:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.