A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice
- URL: http://arxiv.org/abs/2511.00257v1
- Date: Fri, 31 Oct 2025 21:01:53 GMT
- Title: A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice
- Authors: Zachary Chase, Shinji Ito, Idan Mehalel,
- Abstract summary: We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem.<n>The two bounds determine the minimax optimal expected regret to be $Thetaleft( sqrtT K log (N/K) right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.
- Score: 28.32015131406357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of Kale (2014). The two bounds determine the minimax optimal expected regret to be $\Theta\left( \sqrt{T K \log (N/K) } \right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.
Related papers
- Improved Regret Bounds for Bandits with Expert Advice [16.699381591572163]
We prove a lower bound of order $sqrtK T ln(N/K)$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon.
This matches a previously known upper bound of the same order and improves upon the best available lower bound of $sqrtK T (ln N) / (ln K)$.
arXiv Detail & Related papers (2024-06-24T17:14:31Z) - Adversarial Multi-dueling Bandits [0.4467766778351321]
We introduce the problem of regret in adversarial multi-dueling bandits.
We introduce a novel algorithm, MiDEX (Multi Dueling EXP3), to learn from such preference feedback.
arXiv Detail & Related papers (2024-06-18T10:28:12Z) - Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round.
We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z) - Near-optimal Per-Action Regret Bounds for Sleeping Bandits [8.261117235807607]
We derive near-optimal per-action regret bounds for sleeping bandits.
In a setting with $K$ total arms and at most $A$ available arms in each round over $T$ rounds, the best known upper bound is $O(KsqrtTAlnK)$.
We extend our results to the setting of bandits with advice from sleeping experts, generalizing EXP4 along the way.
arXiv Detail & Related papers (2024-03-02T21:22:46Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR [58.40575099910538]
We study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $tau$.
We show the minimax CVaR regret rate is $Omega(sqrttau-1AK)$, where $A$ is the number of actions and $K$ is the number of episodes.
We show that our algorithm achieves the optimal regret of $widetilde O(tau-1sqrtSAK)$ under a continuity assumption and in general attains a near
arXiv Detail & Related papers (2023-02-07T02:22:31Z) - Online Learning and Bandits with Queried Hints [28.270453093780382]
We consider the classic online learning and multi-armed bandit (MAB) problems.
We derive algorithms whose regret bounds have exponentially better dependence on the time horizon.
We show that probing with $k=2$ suffices to achieve time-independent regret bounds for online linear and convex optimization.
arXiv Detail & Related papers (2022-11-04T18:41:08Z) - Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms.
We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z) - Nonstochastic Bandits with Infinitely Many Experts [1.7188280334580197]
We study the problem of nonstochastic bandits with infinitely many experts.
We propose a variant of Exp4.P that, for finitely many experts, enables inference of correct expert rankings.
We then incorporate the variant into a meta-algorithm that works on infinitely many experts.
arXiv Detail & Related papers (2021-02-09T22:42:36Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Adversarial Dueling Bandits [85.14061196945599]
We introduce the problem of regret in Adversarial Dueling Bandits.
The learner has to repeatedly choose a pair of items and observe only a relative binary win-loss' feedback for this pair.
Our main result is an algorithm whose $T$-round regret compared to the emphBorda-winner from a set of $K$ items.
arXiv Detail & Related papers (2020-10-27T19:09:08Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.