Thresholded Lasso Bandit
- URL: http://arxiv.org/abs/2010.11994v4
- Date: Sun, 19 Jun 2022 08:37:21 GMT
- Title: Thresholded Lasso Bandit
- Authors: Kaito Ariu, Kenshi Abe, Alexandre Prouti\`ere
- Abstract summary: Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support.
We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
- Score: 70.17389393497125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we revisit the regret minimization problem in sparse
stochastic contextual linear bandits, where feature vectors may be of large
dimension $d$, but where the reward function depends on a few, say $s_0\ll d$,
of these features only. We present Thresholded Lasso bandit, an algorithm that
(i) estimates the vector defining the reward function as well as its sparse
support, i.e., significant feature elements, using the Lasso framework with
thresholding, and (ii) selects an arm greedily according to this estimate
projected on its support. The algorithm does not require prior knowledge of the
sparsity index $s_0$ and can be parameter-free under some symmetric
assumptions. For this simple algorithm, we establish non-asymptotic regret
upper bounds scaling as $\mathcal{O}( \log d + \sqrt{T} )$ in general, and as
$\mathcal{O}( \log d + \log T)$ under the so-called margin condition (a
probabilistic condition on the separation of the arm rewards). The regret of
previous algorithms scales as $\mathcal{O}( \log d + \sqrt{T \log (d T)})$ and
$\mathcal{O}( \log T \log d)$ in the two settings, respectively. Through
numerical experiments, we confirm that our algorithm outperforms existing
methods.
Related papers
- Batched Stochastic Bandit for Nondegenerate Functions [8.015503209312786]
This paper studies batched bandit learning problems for nondegenerate functions.
We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally.
arXiv Detail & Related papers (2024-05-09T12:50:16Z) - LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits [38.41164102066483]
This study considers the linear contextual bandit problem with independent and identically distributed contexts.
Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the $alpha$-textual-Con (LC)-Tsallis-INF.
arXiv Detail & Related papers (2024-03-05T18:59:47Z) - Fast UCB-type algorithms for stochastic bandits with heavy and super
heavy symmetric noise [45.60098988395789]
We propose a new algorithm for constructing UCB-type algorithms for multi-armed bandits.
We show that in the case of symmetric noise in the reward, we can achieve an $O(log TsqrtKTlog T)$ regret bound instead of $Oleft.
arXiv Detail & Related papers (2024-02-10T22:38:21Z) - Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z) - Optimal Contextual Pricing and Extensions [32.152826900874075]
We give a poly-time algorithm for contextual pricing with $O(d log log T)$ regret which matches the $Omega(d log T)$ lower bound up to the $d log d$ additive factor.
These algorithms are based on a novel technique of bounding the value of the Steiner intrinsic of a convex region at various scales.
arXiv Detail & Related papers (2020-03-03T18:46:29Z) - Tight Regret Bounds for Noisy Optimization of a Brownian Motion [118.65407541895851]
We consider the problem of Bayesian optimization of a one-dimensional Brownian motion in which the $T$ adaptively chosen observations are corrupted by Gaussian noise.
We show that as the smallest possible expected cumulative regret and the smallest possible expected simple regret scale as $Omega(sigmasqrtT cdot log T)$.
arXiv Detail & Related papers (2020-01-25T14:44:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.