A Blackbox Approach to Best of Both Worlds in Bandits and Beyond
- URL: http://arxiv.org/abs/2302.09739v1
- Date: Mon, 20 Feb 2023 03:42:31 GMT
- Title: A Blackbox Approach to Best of Both Worlds in Bandits and Beyond
- Authors: Christoph Dann, Chen-Yu Wei, Julian Zimmert
- Abstract summary: Best-of-both-worlds algorithms for online learning achieve near-optimal regret in both the adversarial and the adversarial regimes.
We present a general reduction from best of both worlds to a wide family of follow-the-regularized-leader (FTRL) and online-mirrordescent (OMD) algorithms.
- Score: 33.13041034490332
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Best-of-both-worlds algorithms for online learning which achieve near-optimal
regret in both the adversarial and the stochastic regimes have received growing
attention recently. Existing techniques often require careful adaptation to
every new problem setup, including specialised potentials and careful tuning of
algorithm parameters. Yet, in domains such as linear bandits, it is still
unknown if there exists an algorithm that can simultaneously obtain
$O(\log(T))$ regret in the stochastic regime and $\tilde{O}(\sqrt{T})$ regret
in the adversarial regime. In this work, we resolve this question positively
and present a general reduction from best of both worlds to a wide family of
follow-the-regularized-leader (FTRL) and online-mirror-descent (OMD)
algorithms. We showcase the capability of this reduction by transforming
existing algorithms that are only known to achieve worst-case guarantees into
new algorithms with best-of-both-worlds guarantees in contextual bandits, graph
bandits and tabular Markov decision processes.
Related papers
- Best-of-Both-Worlds Policy Optimization for CMDPs with Bandit Feedback [34.7178680288326]
Stradi et al.(2024) proposed the first best-of-both-worlds algorithm for constrained Markov decision processes.
In this paper, we provide the first best-of-both-worlds algorithm for CMDPs with bandit feedback.
Our algorithm is based on a policy optimization approach, which is much more efficient than occupancy-measure-based methods.
arXiv Detail & Related papers (2024-10-03T07:44:40Z) - Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits [6.7310264583128445]
Follow-The-Regularized-Leader (FTRL) algorithms often enjoy optimal regret for adversarial as well as bandit problems.
We propose a new FTPL algorithm that generates optimal policies for both adversarial and multi-armed bandits.
arXiv Detail & Related papers (2024-09-30T16:00:23Z) - Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem.
It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling.
We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z) - Stability-penalty-adaptive follow-the-regularized-leader: Sparsity,
game-dependency, and best-of-both-worlds [46.30750729936261]
Follow-the-regularized-leader (FTRL) has recently emerged as one of the most promising approaches for obtaining various types of adaptivity in bandit problems.
We establish several algorithms with three types of adaptivity: sparsity, game-dependency, and best-of-both-worlds (BOBW)
arXiv Detail & Related papers (2023-05-26T23:20:48Z) - Best-of-three-worlds Analysis for Linear Bandits with
Follow-the-regularized-leader Algorithm [11.80907528239756]
Linear bandit problem has been studied for many years in both adversarial settings.
citetLeeLWZ021 propose an algorithm that actively detects the loss type and then switches between different algorithms specially designed for specific settings.
Follow-the-regularized-leader (FTRL) is another type of popular algorithm that can adapt to different environments.
arXiv Detail & Related papers (2023-03-13T02:50:59Z) - A Framework for Adapting Offline Algorithms to Solve Combinatorial
Multi-Armed Bandit Problems with Bandit Feedback [27.192028744078282]
We provide a framework for adapting discrete offline approximation algorithms into sublinear $alpha$-regret methods.
The proposed framework is applied to diverse applications in submodular horizon.
arXiv Detail & Related papers (2023-01-30T23:18:06Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z) - Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.