Improved Regret Bounds for Bandits with Expert Advice
- URL: http://arxiv.org/abs/2406.16802v1
- Date: Mon, 24 Jun 2024 17:14:31 GMT
- Title: Improved Regret Bounds for Bandits with Expert Advice
- Authors: Nicolò Cesa-Bianchi, Khaled Eldowa, Emmanuel Esposito, Julia Olkhovskaya,
- Abstract summary: We prove a lower bound of order $sqrtK T ln(N/K)$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon.
This matches a previously known upper bound of the same order and improves upon the best available lower bound of $sqrtK T (ln N) / (ln K)$.
- Score: 16.699381591572163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order $\sqrt{K T \ln(N/K)}$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of $\sqrt{K T (\ln N) / (\ln K)}$. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.
Related papers
- The Real Price of Bandit Information in Multiclass Classification [73.17969992976501]
We revisit the classical problem of multiclass classification with bandit feedback.
We present a new bandit classification algorithm that guarantees regret $smashwidetildeO(|H|+sqrtT)$.
arXiv Detail & Related papers (2024-05-16T12:11:09Z) - Near-optimal Per-Action Regret Bounds for Sleeping Bandits [8.261117235807607]
We derive near-optimal per-action regret bounds for sleeping bandits.
In a setting with $K$ total arms and at most $A$ available arms in each round over $T$ rounds, the best known upper bound is $O(KsqrtTAlnK)$.
We extend our results to the setting of bandits with advice from sleeping experts, generalizing EXP4 along the way.
arXiv Detail & Related papers (2024-03-02T21:22:46Z) - Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR [58.40575099910538]
We study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $tau$.
We show the minimax CVaR regret rate is $Omega(sqrttau-1AK)$, where $A$ is the number of actions and $K$ is the number of episodes.
We show that our algorithm achieves the optimal regret of $widetilde O(tau-1sqrtSAK)$ under a continuity assumption and in general attains a near
arXiv Detail & Related papers (2023-02-07T02:22:31Z) - Private Online Prediction from Experts: Separations and Faster Rates [74.52487417350221]
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints.
We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries.
arXiv Detail & Related papers (2022-10-24T18:40:19Z) - Constant regret for sequence prediction with limited advice [0.0]
We provide a strategy combining only p = 2 experts per round for prediction and observing m $ge$ 2 experts' losses.
If the learner is constrained to observe only one expert feedback per round, the worst-case regret is the "slow rate" $Omega$($sqrt$ KT)
arXiv Detail & Related papers (2022-10-05T13:32:49Z) - Norm-Agnostic Linear Bandits [9.700635713828698]
We propose novel algorithms that do not require such knowledge for the first time.
We analyze their regret bounds: one for the changing arm set setting and the other for the fixed arm set setting.
Our numerical experiments show standard algorithms assuming knowledge of $S$ can fail catastrophically when $|theta*|le S$ is not true whereas our algorithms enjoy low regret.
arXiv Detail & Related papers (2022-05-03T00:55:42Z) - The price of unfairness in linear bandits with biased feedback [62.25313751895011]
We study the problem of sequential decision making with biased linear bandit feedback.
We show that the worst case regret is higher than the dT 1/2 log(T) regret rate obtained under unbiased feedback.
Interestingly, the gap-dependent rates reveal the existence of non-trivial instances where the problem is no more difficult than its unbiased counterpart.
arXiv Detail & Related papers (2022-03-18T08:03:20Z) - An Experimental Design Approach for Regret Minimization in Logistic
Bandits [26.674062544226636]
The main challenge of logistic bandits is reducing the dependence on a potentially large problem dependent constant $kappa$.
We propose a new warmup sampling algorithm that can dramatically reduce the lower order term in the regret in general.
arXiv Detail & Related papers (2022-02-04T21:56:40Z) - Nonstochastic Bandits with Infinitely Many Experts [1.7188280334580197]
We study the problem of nonstochastic bandits with infinitely many experts.
We propose a variant of Exp4.P that, for finitely many experts, enables inference of correct expert rankings.
We then incorporate the variant into a meta-algorithm that works on infinitely many experts.
arXiv Detail & Related papers (2021-02-09T22:42:36Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Bandits with Knapsacks beyond the Worst-Case [87.54497614804409]
We present three results that go beyond the worst-case perspective.
First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates.
Second, we consider "simple regret" in BwK, which tracks algorithm's performance in a given round, and prove that it is small in all but a few rounds.
arXiv Detail & Related papers (2020-02-01T18:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.