Linear Bandits on Ellipsoids: Minimax Optimal Algorithms
- URL: http://arxiv.org/abs/2502.17175v1
- Date: Mon, 24 Feb 2025 14:12:31 GMT
- Title: Linear Bandits on Ellipsoids: Minimax Optimal Algorithms
- Authors: Raymond Zhang, Hedi Hadiji, Richard Combes,
- Abstract summary: We consider computationally linear bandits where the set of actions is an ellipsoid.<n>We provide the first known minimax optimal algorithm for this problem.<n>A run requires only time $O(dT + d2 log(T/d) + d3)$ and memory $O(d2)$.
- Score: 5.678465386088928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider linear stochastic bandits where the set of actions is an ellipsoid. We provide the first known minimax optimal algorithm for this problem. We first derive a novel information-theoretic lower bound on the regret of any algorithm, which must be at least $\Omega(\min(d \sigma \sqrt{T} + d \|\theta\|_{A}, \|\theta\|_{A} T))$ where $d$ is the dimension, $T$ the time horizon, $\sigma^2$ the noise variance, $A$ a matrix defining the set of actions and $\theta$ the vector of unknown parameters. We then provide an algorithm whose regret matches this bound to a multiplicative universal constant. The algorithm is non-classical in the sense that it is not optimistic, and it is not a sampling algorithm. The main idea is to combine a novel sequential procedure to estimate $\|\theta\|$, followed by an explore-and-commit strategy informed by this estimate. The algorithm is highly computationally efficient, and a run requires only time $O(dT + d^2 \log(T/d) + d^3)$ and memory $O(d^2)$, in contrast with known optimistic algorithms, which are not implementable in polynomial time. We go beyond minimax optimality and show that our algorithm is locally asymptotically minimax optimal, a much stronger notion of optimality. We further provide numerical experiments to illustrate our theoretical findings.
Related papers
- Achieving Tractable Minimax Optimal Regret in Average Reward MDPs [19.663336027878408]
We present the first tractable algorithm with minimax optimal regret of $widetildemathrmO(sqrtmathrmsp(h*) S A T)$.
Remarkably, our algorithm does not require prior information on $mathrmsp(h*)$.
arXiv Detail & Related papers (2024-06-03T11:53:44Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed
Rewards [40.99322897009357]
We propose two novel algorithms based on truncation and mean of medians.
Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches.
Our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $epsilon=1$.
arXiv Detail & Related papers (2023-10-28T13:01:10Z) - Optimal Exploration is no harder than Thompson Sampling [14.726673043806391]
A pure exploration linear bandit problem aims to return $argmax_zin mathcalZ ztoptheta_ast with high probability through noisy measurements of $xtoptheta_ast with $xin mathcalXsubset mathbbRd$.
This complexity is at odds with the popular and simple Thompson Sampling for regret minimization, which just requires access to a posterior sampling and argmax oracle, and does not need to enumerate $mathcalZ$
arXiv Detail & Related papers (2023-10-09T18:21:39Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Nearly Optimal Policy Optimization with Stable at Any Time Guarantee [53.155554415415445]
Policy-based method in citetshani 2020optimistic is only $tildeO(sqrtSAH3K + sqrtAH4K)$ where $S$ is the number of states, $A$ is the number of actions, $H$ is the horizon, and $K$ is the number of episodes, and there is a $sqrtSH$ gap compared with the information theoretic lower bound $tildeOmega(sqrtSAH
arXiv Detail & Related papers (2021-12-21T01:54:17Z) - Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z) - Classical algorithms and quantum limitations for maximum cut on
high-girth graphs [6.262125516926276]
We show that every (quantum or classical) one-local algorithm achieves on $D$-regular graphs of $> 5$ a maximum cut of at most $1/2 + C/sqrtD$ for $C=1/sqrt2 approx 0.7071$.
We show that there is a classical $k$-local algorithm that achieves a value of $1/2 + C/sqrtD - O (1/sqrtk)$ for $D$-regular graphs of $> 2k+1$, where
arXiv Detail & Related papers (2021-06-10T16:28:23Z) - Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems.
We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z) - Randomized Exploration is Near-Optimal for Tabular MDP [45.16374124699648]
We study exploration using randomized value functions in Thompson Sampling (TS)-like algorithms in reinforcement learning.
We show that when we use 1) a single random seed in each episode, and 2) a Bernstein-type magnitude of noise, we obtain a worst-case $widetildeOleft(HsqrtSATright)$ regret bound for episodic time-inhomogeneous Decision Process.
arXiv Detail & Related papers (2021-02-19T01:42:50Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.