Breaking the $\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret
in Stochastic Contextual Linear Bandits
- URL: http://arxiv.org/abs/2205.09899v1
- Date: Thu, 19 May 2022 23:41:46 GMT
- Title: Breaking the $\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret
in Stochastic Contextual Linear Bandits
- Authors: Avishek Ghosh and Abishek Sankararaman
- Abstract summary: We prove an instance (poly) logarithmic regret for contextual bandits with linear payoff.
contexts indeed help to reduce the regret from $sqrtT$ to $polylog(T)$.
- Score: 10.127456032874978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove an instance independent (poly) logarithmic regret for stochastic
contextual bandits with linear payoff. Previously, in \cite{chu2011contextual},
a lower bound of $\mathcal{O}(\sqrt{T})$ is shown for the contextual linear
bandit problem with arbitrary (adversarily chosen) contexts. In this paper, we
show that stochastic contexts indeed help to reduce the regret from $\sqrt{T}$
to $\polylog(T)$. We propose Low Regret Stochastic Contextual Bandits
(\texttt{LR-SCB}), which takes advantage of the stochastic contexts and
performs parameter estimation (in $\ell_2$ norm) and regret minimization
simultaneously. \texttt{LR-SCB} works in epochs, where the parameter estimation
of the previous epoch is used to reduce the regret of the current epoch. The
(poly) logarithmic regret of \texttt{LR-SCB} stems from two crucial facts: (a)
the application of a norm adaptive algorithm to exploit the parameter
estimation and (b) an analysis of the shifted linear contextual bandit
algorithm, showing that shifting results in increasing regret. We have also
shown experimentally that stochastic contexts indeed incurs a regret that
scales with $\polylog(T)$.
Related papers
- An Equivalence Between Static and Dynamic Regret Minimization [10.812831455376218]
We show that for linear losses, dynamic regret minimization is equivalent to static regret minimization in an extended decision space.
We provide an algorithm guaranteeing dynamic regret of the form $R_T(u_1,dots,u_T)le tilde.
arXiv Detail & Related papers (2024-06-03T17:54:58Z) - LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits [38.41164102066483]
This study considers the linear contextual bandit problem with independent and identically distributed contexts.
Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the $alpha$-textual-Con (LC)-Tsallis-INF.
arXiv Detail & Related papers (2024-03-05T18:59:47Z) - Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$.
The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z) - Doubly robust Thompson sampling for linear payoffs [12.375561840897742]
We propose a novel multi-armed contextual bandit algorithm called Doubly Robust (DR) Thompson Sampling.
The proposed algorithm is designed to allow a novel additive regret decomposition leading to an improved regret bound with the order of $tildeO(phi-2sqrtT)$.
arXiv Detail & Related papers (2021-02-01T23:31:10Z) - Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support.
We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.