Problem-Complexity Adaptive Model Selection for Stochastic Linear
Bandits
- URL: http://arxiv.org/abs/2006.02612v2
- Date: Mon, 15 Jun 2020 18:55:38 GMT
- Title: Problem-Complexity Adaptive Model Selection for Stochastic Linear
Bandits
- Authors: Avishek Ghosh, Abishek Sankararaman and Kannan Ramchandran
- Abstract summary: We consider the problem of model selection for two popular linear bandit settings.
In the first setting, we consider the $K$ armed mixture bandits, where the mean reward of arm $i in [K]$ is $mu_i+ langle alpha_i,t,theta*|$.
We show that ALB achieves regret scaling of $O(|theta*|sqrtT)$, where $|theta*|$ is apriori unknown
- Score: 20.207989166682832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of model selection for two popular stochastic linear
bandit settings, and propose algorithms that adapts to the unknown problem
complexity. In the first setting, we consider the $K$ armed mixture bandits,
where the mean reward of arm $i \in [K]$, is $\mu_i+ \langle
\alpha_{i,t},\theta^* \rangle $, with $\alpha_{i,t} \in \mathbb{R}^d$ being the
known context vector and $\mu_i \in [-1,1]$ and $\theta^*$ are unknown
parameters. We define $\|\theta^*\|$ as the problem complexity and consider a
sequence of nested hypothesis classes, each positing a different upper bound on
$\|\theta^*\|$. Exploiting this, we propose Adaptive Linear Bandit (ALB), a
novel phase based algorithm that adapts to the true problem complexity,
$\|\theta^*\|$. We show that ALB achieves regret scaling of
$O(\|\theta^*\|\sqrt{T})$, where $\|\theta^*\|$ is apriori unknown. As a
corollary, when $\theta^*=0$, ALB recovers the minimax regret for the simple
bandit algorithm without such knowledge of $\theta^*$. ALB is the first
algorithm that uses parameter norm as model section criteria for linear
bandits. Prior state of art algorithms \cite{osom} achieve a regret of
$O(L\sqrt{T})$, where $L$ is the upper bound on $\|\theta^*\|$, fed as an input
to the problem. In the second setting, we consider the standard linear bandit
problem (with possibly an infinite number of arms) where the sparsity of
$\theta^*$, denoted by $d^* \leq d$, is unknown to the algorithm. Defining
$d^*$ as the problem complexity, we show that ALB achieves $O(d^*\sqrt{T})$
regret, matching that of an oracle who knew the true sparsity level. This
methodology is then extended to the case of finitely many arms and similar
results are proven. This is the first algorithm that achieves such model
selection guarantees. We further verify our results via synthetic and real-data
experiments.
Related papers
- Distribution-Independent Regression for Generalized Linear Models with
Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise.
We present an algorithm that tackles newthis problem in its most general distribution-independent setting.
This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z) - A/B Testing and Best-arm Identification for Linear Bandits with
Robustness to Non-stationarity [28.068960555415014]
We investigate the fixed-budget best-arm identification problem for linear bandits in a potentially non-stationary environment.
An algorithm will aim to correctly identify the best arm $x* := argmax_xinmathcalXxtopsum_t=1Ttheta_t$ with probability as high as possible.
arXiv Detail & Related papers (2023-07-27T19:03:36Z) - Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$.
In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret.
We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems.
We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z) - Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z) - Non-Convex Compressed Sensing with Training Data [0.0]
We find the solution of the original problem $Ax = b$ with high probability in the range of a one layer linear neural network with comparatively few assumptions on the matrix $A$.
In this paper, we consider an alternative, where instead of suitable initial values we are provided with extra training problems $Ax = B_l$, $l=1, dots, p$ that are related to our compressed sensing problem.
arXiv Detail & Related papers (2021-01-20T20:30:59Z) - Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support.
We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z) - Optimal Strategies for Graph-Structured Bandits [0.0]
We study a structured variant of the multi-armed bandit problem specified by a set of Bernoulli $!=!(nu_a,b)_a in mathcalA, b in mathcalB$ with means $(mu_a,b)_a in mathcalA, b in mathcalB$ with means $(mu_a,b)_a
arXiv Detail & Related papers (2020-07-07T06:27:51Z) - Explicit Best Arm Identification in Linear Bandits Using No-Regret
Learners [17.224805430291177]
We study the problem of best arm identification in linearly parameterised multi-armed bandits.
We propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem.
arXiv Detail & Related papers (2020-06-13T05:00:01Z) - Low-Rank Generalized Linear Bandit Problems [19.052901817304743]
In a low-rank linear bandit problem, the reward of an action is the inner product between the action and an unknown low-rank matrix $Theta*$.
We propose an algorithm based on a novel combination of online-to-confidence-set conversioncitepabbasi2012online and the exponentially weighted average forecaster constructed by a covering of low-rank matrices.
arXiv Detail & Related papers (2020-06-04T15:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.