High dimensional stochastic linear contextual bandit with missing
covariates
- URL: http://arxiv.org/abs/2207.11165v1
- Date: Fri, 22 Jul 2022 16:06:22 GMT
- Title: High dimensional stochastic linear contextual bandit with missing
covariates
- Authors: Byoungwook Jang, Julia Nepper, Marc Chevrette, Jo Handelsman, Alfred
O. Hero III
- Abstract summary: Recent works in bandit problems adopted lasso convergence theory in the sequential decision-making setting.
technical challenges that hinder the application of lasso theory: 1) proving the restricted eigenvalue condition under conditionally sub-Gaussian noise and 2) accounting for the dependence between the context variables and the chosen actions.
- Score: 19.989315104929354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works in bandit problems adopted lasso convergence theory in the
sequential decision-making setting. Even with fully observed contexts, there
are technical challenges that hinder the application of existing lasso
convergence theory: 1) proving the restricted eigenvalue condition under
conditionally sub-Gaussian noise and 2) accounting for the dependence between
the context variables and the chosen actions. This paper studies the effect of
missing covariates on regret for stochastic linear bandit algorithms. Our work
provides a high-probability upper bound on the regret incurred by the proposed
algorithm in terms of covariate sampling probabilities, showing that the regret
degrades due to missingness by at most $\zeta_{min}^2$, where $\zeta_{min}$ is
the minimum probability of observing covariates in the context vector. We
illustrate our algorithm for the practical application of experimental design
for collecting gene expression data by a sequential selection of class
discriminating DNA probes.
Related papers
- Regret Minimization and Statistical Inference in Online Decision Making with High-dimensional Covariates [7.21848268647674]
We integrate the $varepsilon$-greedy bandit algorithm for decision-making with a hard thresholding algorithm for estimating sparse bandit parameters.
Under a margin condition, our method achieves either $O(T1/2)$ regret or classical $O(T1/2)$-consistent inference.
arXiv Detail & Related papers (2024-11-10T01:47:11Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Imputation for High-Dimensional Linear Regression [8.841513006680886]
We show that LASSO retains the minimax estimation rate in the random setting.
We show that the co-root remains mphmph in this setting.
arXiv Detail & Related papers (2020-01-24T19:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.