A Closer Look at Small-loss Bounds for Bandits with Graph Feedback
- URL: http://arxiv.org/abs/2002.00315v2
- Date: Tue, 23 Jun 2020 02:06:49 GMT
- Title: A Closer Look at Small-loss Bounds for Bandits with Graph Feedback
- Authors: Chung-Wei Lee, Haipeng Luo, Mengxiao Zhang
- Abstract summary: We study small-loss bounds for adversarial multi-armed bandits with graph feedback.
We derive the first small-loss bound for general strongly observable graphs.
We also take the first attempt at deriving small-loss bounds for weakly observable graphs.
- Score: 39.78074016649885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study small-loss bounds for adversarial multi-armed bandits with graph
feedback, that is, adaptive regret bounds that depend on the loss of the best
arm or related quantities, instead of the total number of rounds. We derive the
first small-loss bound for general strongly observable graphs, resolving an
open problem of Lykouris et al. (2018). Specifically, we develop an algorithm
with regret $\mathcal{\tilde{O}}(\sqrt{\kappa L_*})$ where $\kappa$ is the
clique partition number and $L_*$ is the loss of the best arm, and for the
special case of self-aware graphs where every arm has a self-loop, we improve
the regret to $\mathcal{\tilde{O}}(\min\{\sqrt{\alpha T}, \sqrt{\kappa L_*}\})$
where $\alpha \leq \kappa$ is the independence number. Our results
significantly improve and extend those by Lykouris et al. (2018) who only
consider self-aware undirected graphs.
Furthermore, we also take the first attempt at deriving small-loss bounds for
weakly observable graphs. We first prove that no typical small-loss bounds are
achievable in this case, and then propose algorithms with alternative
small-loss bounds in terms of the loss of some specific subset of arms. A
surprising side result is that $\mathcal{\tilde{O}}(\sqrt{T})$ regret is
achievable even for weakly observable graphs as long as the best arm has a
self-loop.
Our algorithms are based on the Online Mirror Descent framework but require a
suite of novel techniques that might be of independent interest. Moreover, all
our algorithms can be made parameter-free without the knowledge of the
environment.
Related papers
- On Interpolating Experts and Multi-Armed Bandits [1.9497955504539408]
We prove tight minimax regret bounds for $mathbfm$-MAB and design an optimal PAC algorithm for its pure exploration version, $mathbfm$-BAI.
As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
arXiv Detail & Related papers (2023-07-14T10:38:30Z) - Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$.
Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z) - Improved High-Probability Regret for Adversarial Bandits with
Time-Varying Feedback Graphs [62.52390282012508]
We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds.
We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability.
We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
arXiv Detail & Related papers (2022-10-04T04:36:15Z) - A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with
Feedback Graphs [21.563733343861713]
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set.
We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
arXiv Detail & Related papers (2022-06-01T15:14:32Z) - Understanding Bandits with Graph Feedback [0.0]
We prove a general regret upper bound $Oleft(left(delta*log |V|right)frac13Tfrac23right)$ and a lower bound $Omegaleft(left(delta*/alpharight)frac13Tfrac23right)$.
We show that for several special families of graphs, we can get rid of the $left(log |V|right)frac13$ factor
arXiv Detail & Related papers (2021-05-29T09:35:28Z) - Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms.
We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.