Related papers: A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

URL: http://arxiv.org/abs/2511.00257v1
Date: Fri, 31 Oct 2025 21:01:53 GMT
Title: A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice
Authors: Zachary Chase, Shinji Ito, Idan Mehalel,
Abstract summary: We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem.<n>The two bounds determine the minimax optimal expected regret to be $Thetaleft( sqrtT K log (N/K) right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.
Score: 28.32015131406357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of Kale (2014). The two bounds determine the minimax optimal expected regret to be $\Theta\left( \sqrt{T K \log (N/K) } \right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.

Related papers

Improved Regret Bounds for Bandits with Expert Advice [16.699381591572163]
We prove a lower bound of order $sqrtK T ln(N/K)$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of $sqrtK T (ln N) / (ln K)$.
arXiv Detail & Related papers (2024-06-24T17:14:31Z)
Adversarial Multi-dueling Bandits [0.4467766778351321]
We introduce the problem of regret in adversarial multi-dueling bandits. We introduce a novel algorithm, MiDEX (Multi Dueling EXP3), to learn from such preference feedback.
arXiv Detail & Related papers (2024-06-18T10:28:12Z)
Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round. We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z)
Near-optimal Per-Action Regret Bounds for Sleeping Bandits [8.261117235807607]
We derive near-optimal per-action regret bounds for sleeping bandits. In a setting with $K$ total arms and at most $A$ available arms in each round over $T$ rounds, the best known upper bound is $O(KsqrtTAlnK)$. We extend our results to the setting of bandits with advice from sleeping experts, generalizing EXP4 along the way.
arXiv Detail & Related papers (2024-03-02T21:22:46Z)
Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z)
Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR [58.40575099910538]
We study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $tau$. We show the minimax CVaR regret rate is $Omega(sqrttau-1AK)$, where $A$ is the number of actions and $K$ is the number of episodes. We show that our algorithm achieves the optimal regret of $widetilde O(tau-1sqrtSAK)$ under a continuity assumption and in general attains a near
arXiv Detail & Related papers (2023-02-07T02:22:31Z)
Online Learning and Bandits with Queried Hints [28.270453093780382]
We consider the classic online learning and multi-armed bandit (MAB) problems. We derive algorithms whose regret bounds have exponentially better dependence on the time horizon. We show that probing with $k=2$ suffices to achieve time-independent regret bounds for online linear and convex optimization.
arXiv Detail & Related papers (2022-11-04T18:41:08Z)
Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms. We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z)
Nonstochastic Bandits with Infinitely Many Experts [1.7188280334580197]
We study the problem of nonstochastic bandits with infinitely many experts. We propose a variant of Exp4.P that, for finitely many experts, enables inference of correct expert rankings. We then incorporate the variant into a meta-algorithm that works on infinitely many experts.
arXiv Detail & Related papers (2021-02-09T22:42:36Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Adversarial Dueling Bandits [85.14061196945599]
We introduce the problem of regret in Adversarial Dueling Bandits. The learner has to repeatedly choose a pair of items and observe only a relative binary win-loss' feedback for this pair. Our main result is an algorithm whose $T$-round regret compared to the emphBorda-winner from a set of $K$ items.
arXiv Detail & Related papers (2020-10-27T19:09:08Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.