Be Greedy in Multi-Armed Bandits
- URL: http://arxiv.org/abs/2101.01086v1
- Date: Mon, 4 Jan 2021 16:47:02 GMT
- Title: Be Greedy in Multi-Armed Bandits
- Authors: Matthieu Jedor, Jonathan Lou\"edec, Vianney Perchet
- Abstract summary: The Greedy algorithm is the simplest in sequential decision problem that takes the locally optimal choice at each round.
We provide a generic worst-case bound on the regret of the Greedy algorithm.
We prove that it verifies near-optimal worst-case regret bounds in continuous, infinite and many-armed bandit problems.
- Score: 22.301793734117805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Greedy algorithm is the simplest heuristic in sequential decision problem
that carelessly takes the locally optimal choice at each round, disregarding
any advantages of exploring and/or information gathering. Theoretically, it is
known to sometimes have poor performances, for instance even a linear regret
(with respect to the time horizon) in the standard multi-armed bandit problem.
On the other hand, this heuristic performs reasonably well in practice and it
even has sublinear, and even near-optimal, regret bounds in some very specific
linear contextual and Bayesian bandit models. We build on a recent line of work
and investigate bandit settings where the number of arms is relatively large
and where simple greedy algorithms enjoy highly competitive performance, both
in theory and in practice. We first provide a generic worst-case bound on the
regret of the Greedy algorithm. When combined with some arms subsampling, we
prove that it verifies near-optimal worst-case regret bounds in continuous,
infinite and many-armed bandit problems. Moreover, for shorter time spans, the
theoretical relative suboptimality of Greedy is even reduced. As a consequence,
we subversively claim that for many interesting problems and associated
horizons, the best compromise between theoretical guarantees, practical
performances and computational burden is definitely to follow the greedy
heuristic. We support our claim by many numerical experiments that show
significant improvements compared to the state-of-the-art, even for moderately
long time horizon.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Dual Instrumental Method for Confounded Kernelized Bandits [0.0]
The contextual bandit problem is a framework with wide applications in various fields.
We propose a confounded bandit problem where the noise becomes a latent confounder that affects both contexts and rewards.
We show that a dual instrumental variable regression can correctly identify the true reward function.
arXiv Detail & Related papers (2022-09-07T15:25:57Z) - Versatile Dueling Bandits: Best-of-both-World Analyses for Online
Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments.
We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits.
Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z) - On Slowly-varying Non-stationary Bandits [25.305949034527202]
We consider dynamic regret in non-stationary bandits with a slowly varying property.
We establish the first instance-dependent regret upper bound for slowly varying non-stationary bandits.
We show that our algorithm is essentially minimax optimal.
arXiv Detail & Related papers (2021-10-25T12:56:19Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - On Regret with Multiple Best Arms [12.315392649501101]
We study a regret problem with the existence of multiple best/near-optimal arms in a bandit setting.
Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem.
arXiv Detail & Related papers (2020-06-26T04:01:46Z) - A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms.
We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
arXiv Detail & Related papers (2020-05-23T19:52:44Z) - Greedy Algorithm almost Dominates in Smoothed Contextual Bandits [100.09904315064372]
Online learning algorithms must balance exploration and exploitation.
We show that a greedy approach almost matches the best possible Bayesian regret rate of any other algorithm.
arXiv Detail & Related papers (2020-05-19T18:11:40Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z) - The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms [10.662105162882526]
We investigate a $k$-armed bandit problem in the emphmany-armed regime.
Our findings suggest a new form of emphfree exploration beneficial to greedy algorithms in the many-armed context.
arXiv Detail & Related papers (2020-02-24T08:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.