Statistical Inference with M-Estimators on Bandit Data
- URL: http://arxiv.org/abs/2104.14074v1
- Date: Thu, 29 Apr 2021 01:56:44 GMT
- Title: Statistical Inference with M-Estimators on Bandit Data
- Authors: Kelly W. Zhang, Lucas Janson, and Susan A. Murphy
- Abstract summary: Bandit algorithms are increasingly used in real world sequential decision making problems.
classical statistical approaches fail to provide reliable confidence intervals when used with bandit data.
- Score: 11.09729362243947
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bandit algorithms are increasingly used in real world sequential decision
making problems, from online advertising to mobile health. As a result, there
are more datasets collected using bandit algorithms and with that an increased
desire to be able to use these datasets to answer scientific questions like:
Did one type of ad increase the click-through rate more or lead to more
purchases? In which contexts is a mobile health intervention effective?
However, it has been shown that classical statistical approaches, like those
based on the ordinary least squares estimator, fail to provide reliable
confidence intervals when used with bandit data. Recently methods have been
developed to conduct statistical inference using simple models fit to data
collected with multi-armed bandits. However there is a lack of general methods
for conducting statistical inference using more complex models. In this work,
we develop theory justifying the use of M-estimation (Van der Vaart, 2000),
traditionally used with i.i.d data, to provide inferential methods for a large
class of estimators -- including least squares and maximum likelihood
estimators -- but now with data collected with (contextual) bandit algorithms.
To do this we generalize the use of adaptive weights pioneered by Hadad et al.
(2019) and Deshpande et al. (2018). Specifically, in settings in which the data
is collected via a (contextual) bandit algorithm, we prove that certain
adaptively weighted M-estimators are uniformly asymptotically normal and
demonstrate empirically that we can use their asymptotic distribution to
construct reliable confidence regions for a variety of inferential targets.
Related papers
- Distributed Semi-Supervised Sparse Statistical Inference [6.685997976921953]
A debiased estimator is a crucial tool in statistical inference for high-dimensional model parameters.
Traditional methods require computing a debiased estimator on every machine.
An efficient multi-round distributed debiased estimator, which integrates both labeled and unlabelled data, is developed.
arXiv Detail & Related papers (2023-06-17T17:30:43Z) - Statistical Inference with Stochastic Gradient Methods under
$\phi$-mixing Data [9.77185962310918]
We propose a mini-batch SGD estimator for statistical inference when the data is $phi$-mixing.
The confidence intervals are constructed using an associated mini-batch SGD procedure.
The proposed method is memory-efficient and easy to implement in practice.
arXiv Detail & Related papers (2023-02-24T16:16:43Z) - Online Statistical Inference for Matrix Contextual Bandit [3.465827582464433]
Contextual bandit has been widely used for sequential decision-making based on contextual information and historical feedback data.
We introduce a new online doubly-debiasing inference procedure to simultaneously handle both sources of bias.
Our inference results are built upon a newly developed low-rank gradient descent estimator and its non-asymptotic convergence result.
arXiv Detail & Related papers (2022-12-21T22:03:06Z) - Canary in a Coalmine: Better Membership Inference with Ensembled
Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse.
Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Off-Policy Evaluation via Adaptive Weighting with Data from Contextual
Bandits [5.144809478361604]
We improve the doubly robust (DR) estimator by adaptively weighting observations to control its variance.
We provide empirical evidence for our estimator's improved accuracy and inferential properties relative to existing alternatives.
arXiv Detail & Related papers (2021-06-03T17:54:44Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Inference for Batched Bandits [9.468593929311867]
We develop methods for inference on data collected in batches using a bandit algorithm.
We first prove that the ordinary least squares estimator (OLS) is notally normal on data collected using standard bandit algorithms when there is no unique optimal arm.
Second, we introduce the Batched OLS estimator (BOLS) that we prove is (1) normal on data collected from both multi-arm and contextual bandits and (2) robust to non-stationarity in the baseline reward.
arXiv Detail & Related papers (2020-02-08T18:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.