Fixed-Budget Best-Arm Identification in Sparse Linear Bandits
- URL: http://arxiv.org/abs/2311.00481v1
- Date: Wed, 1 Nov 2023 12:32:17 GMT
- Title: Fixed-Budget Best-Arm Identification in Sparse Linear Bandits
- Authors: Recep Can Yavas, Vincent Y. F. Tan
- Abstract summary: We study the best-arm identification problem in sparse linear bandits under the fixed-budget setting.
We design a two-phase algorithm, Lasso and Optimal-Design- (Lasso-OD) based linear best-arm identification.
We show that Lasso-OD is almost minimax optimal in the exponent.
- Score: 69.6194614504832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the best-arm identification problem in sparse linear bandits under
the fixed-budget setting. In sparse linear bandits, the unknown feature vector
$\theta^*$ may be of large dimension $d$, but only a few, say $s \ll d$ of
these features have non-zero values. We design a two-phase algorithm, Lasso and
Optimal-Design- (Lasso-OD) based linear best-arm identification. The first
phase of Lasso-OD leverages the sparsity of the feature vector by applying the
thresholded Lasso introduced by Zhou (2009), which estimates the support of
$\theta^*$ correctly with high probability using rewards from the selected arms
and a judicious choice of the design matrix. The second phase of Lasso-OD
applies the OD-LinBAI algorithm by Yang and Tan (2022) on that estimated
support. We derive a non-asymptotic upper bound on the error probability of
Lasso-OD by carefully choosing hyperparameters (such as Lasso's regularization
parameter) and balancing the error probabilities of both phases. For fixed
sparsity $s$ and budget $T$, the exponent in the error probability of Lasso-OD
depends on $s$ but not on the dimension $d$, yielding a significant performance
improvement for sparse and high-dimensional linear bandits. Furthermore, we
show that Lasso-OD is almost minimax optimal in the exponent. Finally, we
provide numerical examples to demonstrate the significant performance
improvement over the existing algorithms for non-sparse linear bandits such as
OD-LinBAI, BayesGap, Peace, LinearExploration, and GSE.
Related papers
- Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - No-Regret Linear Bandits beyond Realizability [34.06789803260447]
We study linear bandits when the underlying reward function is not linear.
Existing work relies on a uniform misspecification parameter $epsilon$ that measures the sup-norm error of the best linear approximation.
We describe a more natural model of misspecification which only requires the approximation error at each input $x$ to be proportional to the suboptimality gap at $x$.
arXiv Detail & Related papers (2023-02-26T07:15:31Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Scale Free Adversarial Multi Armed Bandits [13.757095663704858]
We consider the Scale-Free Adversarial Multi Armed Bandit(MAB) problem.
We design a Follow The Regularized Leader(FTRL) algorithm, which comes with the first scale-free regret guarantee for MAB.
We also develop a new technique for obtaining local-norm lower bounds for Bregman Divergences.
arXiv Detail & Related papers (2021-06-08T21:26:57Z) - Towards Minimax Optimal Best Arm Identification in Linear Bandits [95.22854522340938]
We study the problem of best arm identification in linear bandits in the fixed-budget setting.
By leveraging properties of the G-optimal design and incorporating it into the arm allocation rule, we design a parameter-free algorithm.
We provide a theoretical analysis of the failure probability of OD-LinBAI.
arXiv Detail & Related papers (2021-05-27T09:19:10Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.