Related papers: Nearly Tight Bounds for Exploration in Streaming Multi-armed Bandits with Known Optimality Gap

Nearly Tight Bounds for Exploration in Streaming Multi-armed Bandits with Known Optimality Gap

URL: http://arxiv.org/abs/2502.01067v1
Date: Mon, 03 Feb 2025 05:24:35 GMT
Title: Nearly Tight Bounds for Exploration in Streaming Multi-armed Bandits with Known Optimality Gap
Authors: Nikolai Karpov, Chen Wang,
Abstract summary: We investigate the sample-memory-pass trade-offs for pure exploration in multi-pass streaming multi-armed bandits (MABs)<n>We first present a lower bound, showing that any algorithm that finds the best arm with slightly sublinear memory -- a memory of $o(n/textpolylog(n))$ arms -- and $O(sum_i=2n1/Delta2_[i][i]cdot logn)$ arm pulls.<n>We then show a nearly-
Score: 9.095990028343369
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the sample-memory-pass trade-offs for pure exploration in multi-pass streaming multi-armed bandits (MABs) with the *a priori* knowledge of the optimality gap $\Delta_{[2]}$. Here, and throughout, the optimality gap $\Delta_{[i]}$ is defined as the mean reward gap between the best and the $i$-th best arms. A recent line of results by Jin, Huang, Tang, and Xiao [ICML'21] and Assadi and Wang [COLT'24] have shown that if there is no known $\Delta_{[2]}$, a pass complexity of $\Theta(\log(1/\Delta_{[2]}))$ (up to $\log\log(1/\Delta_{[2]})$ terms) is necessary and sufficient to obtain the *worst-case optimal* sample complexity of $O(n/\Delta^{2}_{[2]})$ with a single-arm memory. However, our understanding of multi-pass algorithms with known $\Delta_{[2]}$ is still limited. Here, the key open problem is how many passes are required to achieve the complexity, i.e., $O( \sum_{i=2}^{n}1/\Delta^2_{[i]})$ arm pulls, with a sublinear memory size. In this work, we show that the ``right answer'' for the question is $\Theta(\log{n})$ passes (up to $\log\log{n}$ terms). We first present a lower bound, showing that any algorithm that finds the best arm with slightly sublinear memory -- a memory of $o({n}/{\text{polylog}({n})})$ arms -- and $O(\sum_{i=2}^{n}{1}/{\Delta^{2}_{[i]}}\cdot \log{(n)})$ arm pulls has to make $\Omega(\frac{\log{n}}{\log\log{n}})$ passes over the stream. We then show a nearly-matching algorithm that assuming the knowledge of $\Delta_{[2]}$, finds the best arm with $O( \sum_{i=2}^{n}1/\Delta^2_{[i]} \cdot \log{n})$ arm pulls and a *single arm* memory.

Related papers

Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs [54.28273395444243]
We show that the Monotonic Value Omega (MVP) algorithm achieves a variance-aware gap-dependent regret bound of $$tildeOleft(left(sum_Delta_h(s,a)>0 fracH2 log K land mathttVar_maxtextc$.
arXiv Detail & Related papers (2025-06-06T20:33:57Z)
Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits [3.096236408131107]
We find a set of hard instances on which the regret of any algorithm is $Omega_alphaBig(frac(k-m+1) Tfrac1alphak1 + frac1alpha+1 sum_i:Delta_i > 0Delta_i1 - 2alphaBig)$. This is the first tight gap-dependent regret bound for streaming MAB.
arXiv Detail & Related papers (2025-03-04T09:18:35Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
Optimal Streaming Algorithms for Multi-Armed Bandits [28.579280943038555]
We study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least $1-delta$ probability. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within $O(log Delta-1)$ passes.
arXiv Detail & Related papers (2024-10-23T12:54:04Z)
The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits [10.329863009504303]
We show that any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(fracnDelta2)$ requires $Omega(fraclog (1/Delta)log (1/Delta)$ passes. Our result matches the $O(log(frac1Delta))$-pass algorithm of Jin et al. [ICML'21] that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang.
arXiv Detail & Related papers (2023-09-06T16:41:41Z)
Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-delta$, within $varepsilon$ of being among the top $eta$-fraction of arms.
arXiv Detail & Related papers (2023-06-03T04:00:47Z)
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
We study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. We design a computational efficient algorithm to achieve near-optimal regret of $tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot) hides logarithmic terms of $(S,A,H,K)$ in $K$ episodes. Our technical contribution are two-fold: 1) a near-optimal design scheme to explore
arXiv Detail & Related papers (2022-10-15T09:22:22Z)
Variance-Dependent Best Arm Identification [12.058652922228918]
We study the problem of identifying the best arm in a multi-armed bandit game. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms. We show that our algorithm is optimal up to doubly logarithmic terms.
arXiv Detail & Related papers (2021-06-19T04:13:54Z)
Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
We give an $(varepsilon,delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
arXiv Detail & Related papers (2021-06-05T14:11:01Z)
Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z)
An Optimal Separation of Randomized and Quantum Query Complexity [67.19751155411075]
We prove that for every decision tree, the absolute values of the Fourier coefficients of a given order $ellsqrtbinomdell (1+log n)ell-1,$ sum to at most $cellsqrtbinomdell (1+log n)ell-1,$ where $n$ is the number of variables, $d$ is the tree depth, and $c>0$ is an absolute constant.
arXiv Detail & Related papers (2020-08-24T06:50:57Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model. Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.