Related papers: Nonstochastic Bandits with Infinitely Many Experts

Nonstochastic Bandits with Infinitely Many Experts

URL: http://arxiv.org/abs/2102.05164v1
Date: Tue, 9 Feb 2021 22:42:36 GMT
Title: Nonstochastic Bandits with Infinitely Many Experts
Authors: X. Flora Meng, Tuhin Sarkar, Munther A. Dahleh
Abstract summary: We study the problem of nonstochastic bandits with infinitely many experts. We propose a variant of Exp4.P that, for finitely many experts, enables inference of correct expert rankings. We then incorporate the variant into a meta-algorithm that works on infinitely many experts.
Score: 1.7188280334580197
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of nonstochastic bandits with infinitely many experts: A learner aims to maximize the total reward by taking actions sequentially based on bandit feedback while benchmarking against a countably infinite set of experts. We propose a variant of Exp4.P that, for finitely many experts, enables inference of correct expert rankings while preserving the order of the regret upper bound. We then incorporate the variant into a meta-algorithm that works on infinitely many experts. We prove a high-probability upper bound of $\tilde{\mathcal{O}} \big( i^*K + \sqrt{KT} \big)$ on the regret, up to polylog factors, where $i^*$ is the unknown position of the best expert, $K$ is the number of actions, and $T$ is the time horizon. We also provide an example of structured experts and discuss how to expedite learning in such case. Our meta-learning algorithm achieves the tightest regret upper bound for the setting considered when $i^* = \tilde{\mathcal{O}} \big( \sqrt{T/K} \big)$. If a prior distribution is assumed to exist for $i^*$, the probability of satisfying a tight regret bound increases with $T$, the rate of which can be fast.

Related papers

Improved Regret Bounds for Bandits with Expert Advice [16.699381591572163]
We prove a lower bound of order $sqrtK T ln(N/K)$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of $sqrtK T (ln N) / (ln K)$.
arXiv Detail & Related papers (2024-06-24T17:14:31Z)
Near-optimal Per-Action Regret Bounds for Sleeping Bandits [8.261117235807607]
We derive near-optimal per-action regret bounds for sleeping bandits. In a setting with $K$ total arms and at most $A$ available arms in each round over $T$ rounds, the best known upper bound is $O(KsqrtTAlnK)$. We extend our results to the setting of bandits with advice from sleeping experts, generalizing EXP4 along the way.
arXiv Detail & Related papers (2024-03-02T21:22:46Z)
Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
We show that our algorithm achieves an $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ are the cardinalities of the state and action spaces
arXiv Detail & Related papers (2023-05-15T05:37:32Z)
Streaming Algorithms for Learning with Experts: Deterministic Versus Robust [62.98860182111096]
In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times) The goal is to make a prediction with the minimum cost, specifically compared to the best expert in the set. We show a space lower bound of $widetildeOmegaleft(fracnMRTright)$ for any deterministic algorithm that achieves regret $R$ when the best expert makes $M$ mistakes.
arXiv Detail & Related papers (2023-03-03T04:39:53Z)
Constant regret for sequence prediction with limited advice [0.0]
We provide a strategy combining only p = 2 experts per round for prediction and observing m $ge$ 2 experts' losses. If the learner is constrained to observe only one expert feedback per round, the worst-case regret is the "slow rate" $Omega$($sqrt$ KT)
arXiv Detail & Related papers (2022-10-05T13:32:49Z)
Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes [36.104981594178525]
We study a variant of the contextual bandit problem where an agent can intervene through a set of expert policies. We propose the Divergence-based Upper Confidence Bound (D-UCB) algorithm that uses importance sampling to share information across experts. We also provide the Empirical D-UCB (ED-UCB) algorithm that can function with only approximate knowledge of expert distributions.
arXiv Detail & Related papers (2021-07-07T14:58:14Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards [59.559028338399855]
We consider the problem of sleeping bandits with action sets and adversarial rewards. In this paper, we provide a new computationally efficient algorithm inspired by EXP3 satisfying a regret of order $O(sqrtT)$.
arXiv Detail & Related papers (2020-04-14T00:41:26Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors. We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.