Exploration in Linear Bandits with Rich Action Sets and its Implications
for Inference
- URL: http://arxiv.org/abs/2207.11597v1
- Date: Sat, 23 Jul 2022 20:25:07 GMT
- Title: Exploration in Linear Bandits with Rich Action Sets and its Implications
for Inference
- Authors: Debangshu Banerjee, Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan
- Abstract summary: We show that the minimum eigenvalue of the expected matrix grows as $Omega(sqrtn) whenever the expected cumulative regret of the algorithm is $sqrtn)$.
We apply our result to two practical scenarios -- emphmodel selection and emphclustering in linear bandits.
- Score: 23.364534479492715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a non-asymptotic lower bound on the eigenspectrum of the design
matrix generated by any linear bandit algorithm with sub-linear regret when the
action set has well-behaved curvature. Specifically, we show that the minimum
eigenvalue of the expected design matrix grows as $\Omega(\sqrt{n})$ whenever
the expected cumulative regret of the algorithm is $O(\sqrt{n})$, where $n$ is
the learning horizon, and the action-space has a constant Hessian around the
optimal arm. This shows that such action-spaces force a polynomial lower bound
rather than a logarithmic lower bound, as shown by \cite{lattimore2017end}, in
discrete (i.e., well-separated) action spaces. Furthermore, while the previous
result is shown to hold only in the asymptotic regime (as $n \to \infty$), our
result for these ``locally rich" action spaces is any-time. Additionally, under
a mild technical assumption, we obtain a similar lower bound on the minimum
eigen value holding with high probability.
We apply our result to two practical scenarios -- \emph{model selection} and
\emph{clustering} in linear bandits. For model selection, we show that an
epoch-based linear bandit algorithm adapts to the true model complexity at a
rate exponential in the number of epochs, by virtue of our novel spectral
bound. For clustering, we consider a multi agent framework where we show, by
leveraging the spectral result, that no forced exploration is necessary -- the
agents can run a linear bandit algorithm and estimate their underlying
parameters at once, and hence incur a low regret.
Related papers
- Tangential Randomization in Linear Bandits (TRAiL): Guaranteed Inference and Regret Bounds [1.03590082373586]
We propose and analyze TRAiL, a regret-optimal forced exploration algorithm for linear bandits.
TraiL ensures a $Omega(sqrtT)$ growth in the inference quality, measured via the minimum eigenvalue of the design (regressor) matrix.
We characterize an $Omega(sqrtT)$ minimax lower bound for any algorithm on the expected regret.
arXiv Detail & Related papers (2024-11-19T01:08:13Z) - Linear bandits with polylogarithmic minimax regret [8.97780713904412]
We study a noise model for linear bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector.
We introduce an algorithm for this problem that exhibits a minimax regret scaling as $log3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms.
arXiv Detail & Related papers (2024-02-19T10:56:47Z) - Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation.
We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - High-Dimensional Sparse Linear Bandits [67.9378546011416]
We derive a novel $Omega(n2/3)$ dimension-free minimax regret lower bound for sparse linear bandits in the data-poor regime.
We also prove a dimension-free $O(sqrtn)$ regret upper bound under an additional assumption on the magnitude of the signal for relevant features.
arXiv Detail & Related papers (2020-11-08T16:48:11Z) - Stochastic Linear Bandits with Protected Subspace [51.43660657268171]
We study a variant of the linear bandit problem wherein we optimize a linear objective function but rewards are accrued only to an unknown subspace.
In particular, at each round, the learner must choose whether to query the objective or the protected subspace alongside choosing an action.
Our algorithm, derived from the OFUL principle, uses some of the queries to get an estimate of the protected space.
arXiv Detail & Related papers (2020-11-02T14:59:39Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.