Online Statistical Inference for Matrix Contextual Bandit
- URL: http://arxiv.org/abs/2212.11385v1
- Date: Wed, 21 Dec 2022 22:03:06 GMT
- Title: Online Statistical Inference for Matrix Contextual Bandit
- Authors: Qiyu Han, Will Wei Sun, and Yichen Zhang
- Abstract summary: Contextual bandit has been widely used for sequential decision-making based on contextual information and historical feedback data.
We introduce a new online doubly-debiasing inference procedure to simultaneously handle both sources of bias.
Our inference results are built upon a newly developed low-rank gradient descent estimator and its non-asymptotic convergence result.
- Score: 3.465827582464433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contextual bandit has been widely used for sequential decision-making based
on the current contextual information and historical feedback data. In modern
applications, such context format can be rich and can often be formulated as a
matrix. Moreover, while existing bandit algorithms mainly focused on
reward-maximization, less attention has been paid to the statistical inference.
To fill in these gaps, in this work we consider a matrix contextual bandit
framework where the true model parameter is a low-rank matrix, and propose a
fully online procedure to simultaneously make sequential decision-making and
conduct statistical inference. The low-rank structure of the model parameter
and the adaptivity nature of the data collection process makes this difficult:
standard low-rank estimators are not fully online and are biased, while
existing inference approaches in bandit algorithms fail to account for the
low-rankness and are also biased. To address these, we introduce a new online
doubly-debiasing inference procedure to simultaneously handle both sources of
bias. In theory, we establish the asymptotic normality of the proposed online
doubly-debiased estimator and prove the validity of the constructed confidence
interval. Our inference results are built upon a newly developed low-rank
stochastic gradient descent estimator and its non-asymptotic convergence
result, which is also of independent interest.
Related papers
- Bayesian Nonparametrics Meets Data-Driven Distributionally Robust Optimization [29.24821214671497]
Training machine learning and statistical models often involve optimizing a data-driven risk criterion.
We propose a novel robust criterion by combining insights from Bayesian nonparametric (i.e., Dirichlet process) theory and a recent decision-theoretic model of smooth ambiguity-averse preferences.
For practical implementation, we propose and study tractable approximations of the criterion based on well-known Dirichlet process representations.
arXiv Detail & Related papers (2024-01-28T21:19:15Z) - Online Tensor Inference [0.0]
Traditional offline learning, involving the storage and utilization of all data in each computational iteration, becomes impractical for high-dimensional tensor data.
Existing low-rank tensor methods lack the capability for statistical inference in an online fashion.
Our approach employs Gradient Descent (SGD) to enable efficient real-time data processing without extensive memory requirements.
arXiv Detail & Related papers (2023-12-28T16:37:48Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Online learning in bandits with predicted context [8.257280652461159]
We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context.
This setting is motivated by a wide range of applications where the true context for decision-making is unobserved.
We propose the first online algorithm in this setting with sublinear regret guarantees under mild conditions.
arXiv Detail & Related papers (2023-07-26T02:33:54Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Efficient First-Order Contextual Bandits: Prediction, Allocation, and
Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise.
First-order guarantees are relatively well understood in statistical and online learning.
We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.