Online Statistical Inference for Contextual Bandits via Stochastic
Gradient Descent
- URL: http://arxiv.org/abs/2212.14883v1
- Date: Fri, 30 Dec 2022 18:57:08 GMT
- Title: Online Statistical Inference for Contextual Bandits via Stochastic
Gradient Descent
- Authors: Xi Chen and Zehua Lai and He Li and Yichen Zhang
- Abstract summary: We study the online statistical inference of model parameters in a contextual bandit framework of decision-making.
We propose a general framework for online and adaptive data collection environment that can update decision rules via weighted gradient descent.
- Score: 10.108468796986074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the fast development of big data, it has been easier than before to
learn the optimal decision rule by updating the decision rule recursively and
making online decisions. We study the online statistical inference of model
parameters in a contextual bandit framework of sequential decision-making. We
propose a general framework for online and adaptive data collection environment
that can update decision rules via weighted stochastic gradient descent. We
allow different weighting schemes of the stochastic gradient and establish the
asymptotic normality of the parameter estimator. Our proposed estimator
significantly improves the asymptotic efficiency over the previous averaged SGD
approach via inverse probability weights. We also conduct an optimality
analysis on the weights in a linear regression setting. We provide a Bahadur
representation of the proposed estimator and show that the remainder term in
the Bahadur representation entails a slower convergence rate compared to
classical SGD due to the adaptive data collection.
Related papers
- Continuous Optimization for Offline Change Point Detection and Estimation [0.0]
It exploits reformulating the normal mean multiple change point model into a regularized statistical inverse problem enforcing sparsity.
The recently developed framework of continuous optimization for best subset selection (COMBSS) is briefly introduced and related to the problem at hand.
Supervised and unsupervised perspectives are explored with the latter testing different approaches for the choice of regularization penalty parameters.
arXiv Detail & Related papers (2024-07-03T01:19:59Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Kalman Filter for Online Classification of Non-Stationary Data [101.26838049872651]
In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps.
We introduce a probabilistic Bayesian online learning model by using a neural representation and a state space model over the linear predictor weights.
In experiments in multi-class classification we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.
arXiv Detail & Related papers (2023-06-14T11:41:42Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Time varying regression with hidden linear dynamics [74.9914602730208]
We revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system.
Counterintuitively, we show that when the underlying dynamics are stable the parameters of this model can be estimated from data by combining just two ordinary least squares estimates.
arXiv Detail & Related papers (2021-12-29T23:37:06Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Fast and Robust Online Inference with Stochastic Gradient Descent via
Random Scaling [0.9806910643086042]
We develop a new method of online inference for a vector of parameters estimated by the Polyak-Rtupper averaging procedure of gradient descent algorithms.
Our approach is fully operational with online data and is rigorously underpinned by a functional central limit theorem.
arXiv Detail & Related papers (2021-06-06T15:38:37Z) - Support estimation in high-dimensional heteroscedastic mean regression [2.28438857884398]
We consider a linear mean regression model with random design and potentially heteroscedastic, heavy-tailed errors.
We use a strictly convex, smooth variant of the Huber loss function with tuning parameter depending on the parameters of the problem.
For the resulting estimator we show sign-consistency and optimal rates of convergence in the $ell_infty$ norm.
arXiv Detail & Related papers (2020-11-03T09:46:31Z) - Statistical Inference for Online Decision Making via Stochastic Gradient
Descent [31.103438051597887]
We propose an online algorithm that can make decisions and update the decision rule online via gradient descent.
It is not only efficient but also supports all kinds of parametric reward models.
The proposed algorithm and theoretical results are tested by simulations and a real data application to news article recommendation.
arXiv Detail & Related papers (2020-10-14T18:25:18Z) - Online Covariance Matrix Estimation in Stochastic Gradient Descent [10.153224593032677]
gradient descent (SGD) is widely used for parameter estimation especially for huge data sets and online learning.
This paper aims at quantifying statistical inference of SGD-based estimates in an online setting.
arXiv Detail & Related papers (2020-02-10T17:46:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.