Related papers: Off-policy Confidence Sequences

Off-policy Confidence Sequences

URL: http://arxiv.org/abs/2102.09540v1
Date: Thu, 18 Feb 2021 18:40:30 GMT
Title: Off-policy Confidence Sequences
Authors: Nikos Karampatziakis, Paul Mineiro, Aaditya Ramdas
Abstract summary: We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting. We provide algorithms for computing these confidence sequences that strike a good balance between computational and statistical efficiency.
Score: 33.749904615295485
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting. These confidence sequences are based on recent ideas from martingale analysis and are non-asymptotic, non-parametric, and valid at arbitrary stopping times. We provide algorithms for computing these confidence sequences that strike a good balance between computational and statistical efficiency. We empirically demonstrate the tightness of our approach in terms of failure probability and width and apply it to the "gated deployment" problem of safely upgrading a production contextual bandit system.

Related papers

A new and flexible class of sharp asymptotic time-uniform confidence sequences [0.0]
As in classical statistics, confidence sequences are a nonparametric tool showing under which high-level assumptions coverage is achieved. We propose a new flexible class of confidence sequences yielding sharp time-uniform confidence sequences under mild assumptions.
arXiv Detail & Related papers (2025-02-14T18:57:16Z)
Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose. In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z)
Tighter Confidence Bounds for Sequential Kernel Regression [3.683202928838613]
We use martingale tail inequalities to establish new confidence bounds for sequential kernel regression. Our confidence bounds can be computed by solving a conic program, although this bare version quickly becomes impractical. We find that when our confidence bounds replace existing ones, the KernelUCB algorithm has better empirical performance, a matching worst-case performance guarantee and comparable computational cost.
arXiv Detail & Related papers (2024-03-19T13:47:35Z)
High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization [16.38026811561888]
This paper introduces a novel inference method focused on constructing confidence intervals with efficient computation and fast convergence to the nominal level. Our method requires minimal additional computation and memory beyond the standard updating of estimates, making the inference process almost cost-free.
arXiv Detail & Related papers (2024-01-17T17:11:45Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification. We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate. We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z)
Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures [26.683757807252675]
We present improved algorithms with worst-case regret guarantees for the linear bandit problem. We show that our confidence sequences are tighter than competitors, both empirically and theoretically.
arXiv Detail & Related papers (2023-09-25T17:13:46Z)
Huber-Robust Confidence Sequences [37.16361789841549]
Confidence sequences are confidence intervals that can be sequentially tracked, and are valid at arbitrary data-dependent stopping times. We show that the resulting confidence sequences attain the optimal width achieved in the nonsequential setting. Since confidence sequences are a common tool used within A/B/n testing and bandits, these results open the door to sequential experimentation that is robust to outliers and adversarial corruptions.
arXiv Detail & Related papers (2023-01-23T17:29:26Z)
An evaluation of word-level confidence estimation for end-to-end automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR) We provide an extensive benchmark of popular confidence methods on four well-known speech datasets. Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z)
Towards Safe Policy Improvement for Non-Stationary MDPs [48.9966576179679]
Many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis.
arXiv Detail & Related papers (2020-10-23T20:13:51Z)
CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning. We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.