Related papers: Tight Memory-Regret Lower Bounds for Streaming Bandits

Tight Memory-Regret Lower Bounds for Streaming Bandits

URL: http://arxiv.org/abs/2306.07903v1
Date: Tue, 13 Jun 2023 16:54:13 GMT
Title: Tight Memory-Regret Lower Bounds for Streaming Bandits
Authors: Shaoang Li, Lan Zhang, Junhao Wang, Xiang-Yang Li
Abstract summary: learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $Omega left( (TB)alpha K1-alpharight), alpha = 2B / (2B+1-1)$ for any algorithm. We also provide a multi-pass algorithm that achieves a regret upper bound of $tildeO left( (TB)alpha K1 - alpharight)$ using constant arm memory.
Score: 11.537938617281736
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $\Omega \left( (TB)^{\alpha} K^{1-\alpha}\right), \alpha = 2^{B} / (2^{B+1}-1)$ for any algorithm with a time horizon $T$, number of arms $K$, and number of passes $B$. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known $\Omega(\sqrt{KT})$ lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of $\Omega \left(T^{1/(B+1)} \sum_{\Delta_x>0} \frac{\mu^*}{\Delta_x}\right)$ for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of $\epsilon$-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of $\tilde{O} \left( (TB)^{\alpha} K^{1 - \alpha}\right)$ using constant arm memory.

Related papers

Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round. We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z)
The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits [10.329863009504303]
We show that any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(fracnDelta2)$ requires $Omega(fraclog (1/Delta)log (1/Delta)$ passes. Our result matches the $O(log(frac1Delta))$-pass algorithm of Jin et al. [ICML'21] that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang.
arXiv Detail & Related papers (2023-09-06T16:41:41Z)
On Interpolating Experts and Multi-Armed Bandits [1.9497955504539408]
We prove tight minimax regret bounds for $mathbfm$-MAB and design an optimal PAC algorithm for its pure exploration version, $mathbfm$-BAI. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
arXiv Detail & Related papers (2023-07-14T10:38:30Z)
Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits [3.5955736977697073]
In the single-pass setting with $K$ arms and $T$ trials, a regret lower bound of $Omega(T2/3)$ has been proved for any algorithm with $o(K)$ memory. In this paper, we improve the regret lower bound to $Omega(K/3log/3(T))$ for algorithms with $o(K)$ memory. We show that the proposed algorithms consistently outperform the benchmark uniform exploration algorithm by a large margin, and on occasion, reduce the regret by up to 70%.
arXiv Detail & Related papers (2023-06-03T22:41:44Z)
Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$. In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret. We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z)
Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon. Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z)
Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support. We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.