Follow-the-Perturbed-Leader with Fr\'{e}chet-type Tail Distributions:
Optimality in Adversarial Bandits and Best-of-Both-Worlds
- URL: http://arxiv.org/abs/2403.05134v1
- Date: Fri, 8 Mar 2024 08:07:26 GMT
- Title: Follow-the-Perturbed-Leader with Fr\'{e}chet-type Tail Distributions:
Optimality in Adversarial Bandits and Best-of-Both-Worlds
- Authors: Jongyeong Lee and Junya Honda and Shinji Ito and Min-hwan Oh
- Abstract summary: We study the optimality of the Follow-the-Perturbed-Leader ( FTPL) policy in both adversarial and $K$-armed bandits.
We establish a sufficient condition for perturbations to achieve $mathcalO(sqrtKT)$ regrets in the adversarial setting.
- Score: 43.35179630620512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the optimality of the Follow-the-Perturbed-Leader (FTPL)
policy in both adversarial and stochastic $K$-armed bandits. Despite the
widespread use of the Follow-the-Regularized-Leader (FTRL) framework with
various choices of regularization, the FTPL framework, which relies on random
perturbations, has not received much attention, despite its inherent
simplicity. In adversarial bandits, there has been conjecture that FTPL could
potentially achieve $\mathcal{O}(\sqrt{KT})$ regrets if perturbations follow a
distribution with a Fr\'{e}chet-type tail. Recent work by Honda et al. (2023)
showed that FTPL with Fr\'{e}chet distribution with shape $\alpha=2$ indeed
attains this bound and, notably logarithmic regret in stochastic bandits,
meaning the Best-of-Both-Worlds (BOBW) capability of FTPL. However, this result
only partly resolves the above conjecture because their analysis heavily relies
on the specific form of the Fr\'{e}chet distribution with this shape. In this
paper, we establish a sufficient condition for perturbations to achieve
$\mathcal{O}(\sqrt{KT})$ regrets in the adversarial setting, which covers,
e.g., Fr\'{e}chet, Pareto, and Student-$t$ distributions. We also demonstrate
the BOBW achievability of FTPL with certain Fr\'{e}chet-type tail
distributions. Our results contribute not only to resolving existing
conjectures through the lens of extreme value theory but also potentially offer
insights into the effect of the regularization functions in FTRL through the
mapping from FTPL to FTRL.
Related papers
- Note on Follow-the-Perturbed-Leader in Combinatorial Semi-Bandit Problems [10.435741631709403]
We study the optimality and complexity of Follow-the-Perturbed-Leader ( FTPL) policy in size-invariant semi-bandit problems.<n>We extend the conditional geometric resampling (CGR) to size-invariant semi-bandit setting, which reduces the computational complexity from $O(d2)$ of original GR to $Oleft(mdleft(log(d/m)+1right)$ without sacrificing the regret performance of FTPL.
arXiv Detail & Related papers (2025-06-14T13:06:30Z) - Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems [20.47953854427799]
We consider a common case of the semi-bandit problem, where the learner exactly selects $m$ arms from the total $d$ arms.
We show that FTPL with a Fr'echet perturbation also enjoys the near optimal regret bound $mathcalO(sqrtnmdlog(d))$ in the adversarial setting.
arXiv Detail & Related papers (2025-04-09T22:07:01Z) - Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits [49.96531901205305]
We analyze $f$-divergence-regularized offline policy learning.<n>For reverse Kullback-Leibler (KL) divergence, we give the first $tildeO(epsilon-1)$ sample complexity under single-policy concentrability.<n>We extend our analysis to dueling bandits, and we believe these results take a significant step toward a comprehensive understanding of $f$-divergence-regularized policy learning.
arXiv Detail & Related papers (2025-02-09T22:14:45Z) - The Power of Perturbation under Sampling in Solving Extensive-Form Games [56.013335390600524]
This paper investigates how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in imperfect-information extensive-form games.
Perturbing the expected payoffs guarantees that the FTRL dynamics reach an approximate equilibrium.
We show that in the last-iterate sense, the FTRL consistently outperforms the non-samplinged FTRL.
arXiv Detail & Related papers (2025-01-28T00:29:38Z) - Distributed Online Optimization with Stochastic Agent Availability [14.801853435122904]
We investigate a variant of distributed online optimization where agents are active with a known probability $p$ at each time step.
We analyze its network regret, defined through the average of the instantaneous regret of the active agents.
arXiv Detail & Related papers (2024-11-25T15:20:01Z) - Gradual Domain Adaptation via Manifold-Constrained Distributionally Robust Optimization [0.4732176352681218]
This paper addresses the challenge of gradual domain adaptation within a class of manifold-constrained data distributions.
We propose a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius.
Our bounds rely on a newly introduced it compatibility measure, which fully characterizes the error propagation dynamics along the sequence.
arXiv Detail & Related papers (2024-10-17T22:07:25Z) - Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits [6.7310264583128445]
Follow-The-Regularized-Leader (FTRL) algorithms often enjoy optimal regret for adversarial as well as bandit problems.
We propose a new FTPL algorithm that generates optimal policies for both adversarial and multi-armed bandits.
arXiv Detail & Related papers (2024-09-30T16:00:23Z) - Implicitly normalized forecaster with clipping for linear and non-linear
heavy-tailed multi-armed bandits [85.27420062094086]
Implicitly Normalized Forecaster (INF) is considered an optimal solution for adversarial multi-armed bandit (MAB) problems.
We propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INFclip) for MAB problems with heavy-tailed settings.
We demonstrate that INFclip is optimal for linear heavy-tailed MAB problems and works well for non-linear ones.
arXiv Detail & Related papers (2023-05-11T12:00:43Z) - Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model [50.38446482252857]
Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
arXiv Detail & Related papers (2022-08-22T17:24:55Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed.
We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.