Related papers: Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

URL: http://arxiv.org/abs/2306.02208v1
Date: Sat, 3 Jun 2023 22:41:44 GMT
Title: Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits
Authors: Chen Wang
Abstract summary: In the single-pass setting with $K$ arms and $T$ trials, a regret lower bound of $Omega(T2/3)$ has been proved for any algorithm with $o(K)$ memory. In this paper, we improve the regret lower bound to $Omega(K/3log/3(T))$ for algorithms with $o(K)$ memory. We show that the proposed algorithms consistently outperform the benchmark uniform exploration algorithm by a large margin, and on occasion, reduce the regret by up to 70%.
Score: 3.5955736977697073
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Regret minimization in streaming multi-armed bandits (MABs) has been studied extensively in recent years. In the single-pass setting with $K$ arms and $T$ trials, a regret lower bound of $\Omega(T^{2/3})$ has been proved for any algorithm with $o(K)$ memory (Maiti et al. [NeurIPS'21]; Agarwal at al. [COLT'22]). On the other hand, however, the previous best regret upper bound is still $O(K^{1/3} T^{2/3}\log^{1/3}(T))$, which is achieved by the streaming implementation of the simple uniform exploration. The $O(K^{1/3}\log^{1/3}(T))$ gap leaves the open question of the tight regret bound in the single-pass MABs with sublinear arm memory. In this paper, we answer this open problem and complete the picture of regret minimization in single-pass streaming MABs. We first improve the regret lower bound to $\Omega(K^{1/3}T^{2/3})$ for algorithms with $o(K)$ memory, which matches the uniform exploration regret up to a logarithm factor in $T$. We then show that the $\log^{1/3}(T)$ factor is not necessary, and we can achieve $O(K^{1/3}T^{2/3})$ regret by finding an $\varepsilon$-best arm and committing to it in the rest of the trials. For regret minimization with high constant probability, we can apply the single-memory $\varepsilon$-best arm algorithms in Jin et al. [ICML'21] to obtain the optimal bound. Furthermore, for the expected regret minimization, we design an algorithm with a single-arm memory that achieves $O(K^{1/3} T^{2/3}\log(K))$ regret, and an algorithm with $O(\log^{*}(n))$-memory with the optimal $O(K^{1/3} T^{2/3})$ regret following the $\varepsilon$-best arm algorithm in Assadi and Wang [STOC'20]. We further tested the empirical performances of our algorithms. The simulation results show that the proposed algorithms consistently outperform the benchmark uniform exploration algorithm by a large margin, and on occasion, reduce the regret by up to 70%.

Related papers

p-Mean Regret for Stochastic Bandits [52.828710025519996]
We introduce a simple, unified UCB-based algorithm that achieves novel $p$-mean regret bounds. Our framework encompasses both average cumulative regret and Nash regret as special cases.
arXiv Detail & Related papers (2024-12-14T08:38:26Z)
Improved Regret for Bandit Convex Optimization with Delayed Feedback [50.46856739179311]
bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under a delay. We develop a novel algorithm, and prove that it enjoys a regret bound of $O(sqrtnT3/4+sqrtdT)$ in general. We show that the proposed algorithm can improve the regret bound to $O((nT)2/3log/3T+dlog T)$ for strongly convex functions.
arXiv Detail & Related papers (2024-02-14T13:08:26Z)
Minimax Optimal Submodular Optimization with Bandit Feedback [13.805872311596739]
We consider maximizing a monotonic, submodular set function $f: 2[n] rightarrow [0,1]$ under bandit feedback. Specifically, $f$ is unknown to the learner but at each time $t=1,dots,T$ the learner chooses a set $S_t subset [n]$ with $|S_t| leq k$ and receives reward $f(S_t) + eta_t$ where $eta_t$ is mean-zero sub-Gaussian noise.
arXiv Detail & Related papers (2023-10-27T20:19:03Z)
The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits [10.329863009504303]
We show that any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(fracnDelta2)$ requires $Omega(fraclog (1/Delta)log (1/Delta)$ passes. Our result matches the $O(log(frac1Delta))$-pass algorithm of Jin et al. [ICML'21] that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang.
arXiv Detail & Related papers (2023-09-06T16:41:41Z)
Tight Memory-Regret Lower Bounds for Streaming Bandits [11.537938617281736]
learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $Omega left( (TB)alpha K1-alpharight), alpha = 2B / (2B+1-1)$ for any algorithm. We also provide a multi-pass algorithm that achieves a regret upper bound of $tildeO left( (TB)alpha K1 - alpharight)$ using constant arm memory.
arXiv Detail & Related papers (2023-06-13T16:54:13Z)
Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime [74.52487417350221]
We consider online learning problems in the realizable setting, where there is a zero-loss solution. We propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds.
arXiv Detail & Related papers (2023-02-27T21:19:24Z)
Better Best of Both Worlds Bounds for Bandits with Switching Costs [37.71741656687868]
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021. We introduce a surprisingly simple and effective that simultaneously achieves minimax optimal regret bound of $mathcalO(T2/3)$ in the oblivious adversarial setting.
arXiv Detail & Related papers (2022-06-07T08:22:56Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support. We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.