Related papers: On Interpolating Experts and Multi-Armed Bandits

On Interpolating Experts and Multi-Armed Bandits

URL: http://arxiv.org/abs/2307.07264v2
Date: Fri, 4 Aug 2023 05:07:47 GMT
Title: On Interpolating Experts and Multi-Armed Bandits
Authors: Houshuang Chen, Yuchen He, Chihao Zhang
Abstract summary: We prove tight minimax regret bounds for $mathbfm$-MAB and design an optimal PAC algorithm for its pure exploration version, $mathbfm$-BAI. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
Score: 1.9497955504539408
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,\dots,m_K)\in \mathbb{N}^K$, an instance of $\mathbf{m}$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains $m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $\mathbf{m}$-MAB and design an optimal PAC algorithm for its pure exploration version, $\mathbf{m}$-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $\mathbf{m}$-MAB is $\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an $(\epsilon,0.05)$-PAC algorithm of $\mathbf{m}$-BAI is $\Theta\left(\frac{1}{\epsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Both our upper bounds and lower bounds for $\mathbf{m}$-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.

Related papers

Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round. We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z)
Blocked Collaborative Bandits: Online Collaborative Filtering with Per-Item Budget Constraints [46.65419724935037]
We consider the problem of emphblocked collaborative bandits where there are multiple users, each with an associated multi-armed bandit problem. Our goal is to design algorithms that maximize the cumulative reward accrued by all the users over time. textttB-LATTICE achieves a per-user regret of $widetildeO(sqrtmathsfT(sqrtmathsfNmathsfM-1)$ under a budget constraint.
arXiv Detail & Related papers (2023-10-31T11:04:21Z)
Tight Memory-Regret Lower Bounds for Streaming Bandits [11.537938617281736]
learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $Omega left( (TB)alpha K1-alpharight), alpha = 2B / (2B+1-1)$ for any algorithm. We also provide a multi-pass algorithm that achieves a regret upper bound of $tildeO left( (TB)alpha K1 - alpharight)$ using constant arm memory.
arXiv Detail & Related papers (2023-06-13T16:54:13Z)
Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$. Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z)
Approximate Top-$m$ Arm Identification with Heterogeneous Reward Variances [11.555138461092177]
We show that the worst-case sample complexity of this problem is $$Thetaleft( sum_i =1n fracsigma_i2epsilon2 lnfrac1delta + sum_i in Gm fracsigma_j2epsilon2 textEnt(sigma2_Gr) right),$$ where $Gm, Gl, Gr
arXiv Detail & Related papers (2022-04-11T16:41:33Z)
Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals. Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z)
Understanding Bandits with Graph Feedback [0.0]
We prove a general regret upper bound $Oleft(left(delta*log |V|right)frac13Tfrac23right)$ and a lower bound $Omegaleft(left(delta*/alpharight)frac13Tfrac23right)$. We show that for several special families of graphs, we can get rid of the $left(log |V|right)frac13$ factor
arXiv Detail & Related papers (2021-05-29T09:35:28Z)
Linear Bandits on Uniformly Convex Sets [88.3673525964507]
Linear bandit algorithms yield $tildemathcalO(nsqrtT)$ pseudo-regret bounds on compact convex action sets. Two types of structural assumptions lead to better pseudo-regret bounds.
arXiv Detail & Related papers (2021-03-10T07:33:03Z)
Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms. We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
A Closer Look at Small-loss Bounds for Bandits with Graph Feedback [39.78074016649885]
We study small-loss bounds for adversarial multi-armed bandits with graph feedback. We derive the first small-loss bound for general strongly observable graphs. We also take the first attempt at deriving small-loss bounds for weakly observable graphs.
arXiv Detail & Related papers (2020-02-02T03:48:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.