Related papers: Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

URL: http://arxiv.org/abs/2207.11597v1
Date: Sat, 23 Jul 2022 20:25:07 GMT
Title: Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference
Authors: Debangshu Banerjee, Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan
Abstract summary: We show that the minimum eigenvalue of the expected matrix grows as $Omega(sqrtn) whenever the expected cumulative regret of the algorithm is $sqrtn)$. We apply our result to two practical scenarios -- emphmodel selection and emphclustering in linear bandits.
Score: 23.364534479492715
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a non-asymptotic lower bound on the eigenspectrum of the design matrix generated by any linear bandit algorithm with sub-linear regret when the action set has well-behaved curvature. Specifically, we show that the minimum eigenvalue of the expected design matrix grows as $\Omega(\sqrt{n})$ whenever the expected cumulative regret of the algorithm is $O(\sqrt{n})$, where $n$ is the learning horizon, and the action-space has a constant Hessian around the optimal arm. This shows that such action-spaces force a polynomial lower bound rather than a logarithmic lower bound, as shown by \cite{lattimore2017end}, in discrete (i.e., well-separated) action spaces. Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$), our result for these ``locally rich" action spaces is any-time. Additionally, under a mild technical assumption, we obtain a similar lower bound on the minimum eigen value holding with high probability. We apply our result to two practical scenarios -- \emph{model selection} and \emph{clustering} in linear bandits. For model selection, we show that an epoch-based linear bandit algorithm adapts to the true model complexity at a rate exponential in the number of epochs, by virtue of our novel spectral bound. For clustering, we consider a multi agent framework where we show, by leveraging the spectral result, that no forced exploration is necessary -- the agents can run a linear bandit algorithm and estimate their underlying parameters at once, and hence incur a low regret.

Related papers

Linear Bandits with Partially Observable Features [35.08645010112184]
We introduce a novel linear bandit problem with partially observable features, resulting in partial reward information and spurious estimates. Without proper address for latent part, regret possibly grows linearly in decision horizon $T$, as their influence on rewards are unknown. We propose a novel analysis to handle the latent features and an algorithm that achieves sublinear regret.
arXiv Detail & Related papers (2025-02-10T04:15:38Z)
Tangential Randomization in Linear Bandits (TRAiL): Guaranteed Inference and Regret Bounds [1.03590082373586]
We propose and analyze TRAiL, a regret-optimal forced exploration algorithm for linear bandits. TraiL ensures a $Omega(sqrtT)$ growth in the inference quality, measured via the minimum eigenvalue of the design (regressor) matrix. We characterize an $Omega(sqrtT)$ minimax lower bound for any algorithm on the expected regret.
arXiv Detail & Related papers (2024-11-19T01:08:13Z)
Linear bandits with polylogarithmic minimax regret [8.97780713904412]
We study a noise model for linear bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector. We introduce an algorithm for this problem that exhibits a minimax regret scaling as $log3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms.
arXiv Detail & Related papers (2024-02-19T10:56:47Z)
Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation. We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
High-Dimensional Sparse Linear Bandits [67.9378546011416]
We derive a novel $Omega(n2/3)$ dimension-free minimax regret lower bound for sparse linear bandits in the data-poor regime. We also prove a dimension-free $O(sqrtn)$ regret upper bound under an additional assumption on the magnitude of the signal for relevant features.
arXiv Detail & Related papers (2020-11-08T16:48:11Z)
Stochastic Linear Bandits with Protected Subspace [51.43660657268171]
We study a variant of the linear bandit problem wherein we optimize a linear objective function but rewards are accrued only to an unknown subspace. In particular, at each round, the learner must choose whether to query the objective or the protected subspace alongside choosing an action. Our algorithm, derived from the OFUL principle, uses some of the queries to get an estimate of the protected space.
arXiv Detail & Related papers (2020-11-02T14:59:39Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.