Revisiting Follow-the-Perturbed-Leader with Unbounded Perturbations in Bandit Problems
- URL: http://arxiv.org/abs/2508.18604v1
- Date: Tue, 26 Aug 2025 02:12:18 GMT
- Title: Revisiting Follow-the-Perturbed-Leader with Unbounded Perturbations in Bandit Problems
- Authors: Jongyeong Lee, Junya Honda, Shinji Ito, Min-hwan Oh,
- Abstract summary: Follow-the-Regularized-Leader (FTRL) policies have achieved Best-of-Both-Worlds (BOBW) results in various settings through hybrid regularizers.<n>We revisit classical FTRL-FTPL duality for unbounded perturbations and establish BOBW results for FTPL under a broad family of asymmetric Fr'echet-type perturbations.
- Score: 60.58442311545223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Follow-the-Regularized-Leader (FTRL) policies have achieved Best-of-Both-Worlds (BOBW) results in various settings through hybrid regularizers, whereas analogous results for Follow-the-Perturbed-Leader (FTPL) remain limited due to inherent analytical challenges. To advance the analytical foundations of FTPL, we revisit classical FTRL-FTPL duality for unbounded perturbations and establish BOBW results for FTPL under a broad family of asymmetric unbounded Fr\'echet-type perturbations, including hybrid perturbations combining Gumbel-type and Fr\'echet-type tails. These results not only extend the BOBW results of FTPL but also offer new insights into designing alternative FTPL policies competitive with hybrid regularization approaches. Motivated by earlier observations in two-armed bandits, we further investigate the connection between the $1/2$-Tsallis entropy and a Fr\'echet-type perturbation. Our numerical observations suggest that it corresponds to a symmetric Fr\'echet-type perturbation, and based on this, we establish the first BOBW guarantee for symmetric unbounded perturbations in the two-armed setting. In contrast, in general multi-armed bandits, we find an instance in which symmetric Fr\'echet-type perturbations violate the key condition for standard BOBW analysis, which is a problem not observed with asymmetric or nonnegative Fr\'echet-type perturbations. Although this example does not rule out alternative analyses achieving BOBW results, it suggests the limitations of directly applying the relationship observed in two-armed cases to the general case and thus emphasizes the need for further investigation to fully understand the behavior of FTPL in broader settings.
Related papers
- Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
We consider the problem of contextual online RLHF with general preferences.<n>We adopt the Generalized Bilinear Preference Model to capture preferences via low-rank, skew-symmetric matrices.<n>We prove that the dual gap of the greedy policy is bounded by the square of the estimation error.
arXiv Detail & Related papers (2026-02-26T15:27:53Z) - Sharp Convergence Rates for Masked Diffusion Models [53.117058231393834]
We develop a total-variation based analysis for the Euler method that overcomes limitations.<n>Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees.<n>Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS.
arXiv Detail & Related papers (2026-02-26T00:47:51Z) - Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z) - Statistical-Geometric Degeneracy in UAV Search: A Physics-Aware Asymmetric Filtering Approach [23.49656058107753]
Post-disaster survivor localization using Unmanned Aerial Vehicles (UAVs) faces a fundamental physical challenge.<n>Unlike standard Gaussian noise, signal reflection from debris introduces strictly non-negative ranging biases.<n>Existing robust estimators, typically designed with symmetric loss functions, implicitly rely on the assumption of error symmetry.<n>We propose a physically-grounded solution, the AsymmetricHuberEKF, which explicitly incorporates the non-negative physical prior of NLOS biases.
arXiv Detail & Related papers (2026-02-11T08:33:56Z) - Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation [19.404286148401795]
Group Relative Advantage Estimation (GRAE) has an implicit advantage symmetry inherent in it.<n>We propose Asymmetric GRAE, which dynamically modulates exploration incentives and sample-difficulty focus.<n> Experiments across seven benchmarks demonstrate that A-GRAE consistently improves GRPO and its variants across both LLMs and MLLMs.
arXiv Detail & Related papers (2026-02-05T11:07:14Z) - Joint Asymmetric Loss for Learning with Noisy Labels [95.14298444251044]
symmetric losses usually suffer from the underfitting issue due to the overly strict constraint.<n>Within APL, symmetric losses have been successfully extended, yielding advanced robust loss functions.<n>We introduce a novel robust loss framework termed Joint Asymmetric Loss (JAL)
arXiv Detail & Related papers (2025-07-23T16:57:43Z) - On the Power of Perturbation under Sampling in Solving Extensive-Form Games [56.013335390600524]
We investigate how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in solving extensive-form games under sampling.<n>We present a unified framework for textitPerturbed FTRL algorithms and study two variants: PFTRL-KL and PFTRL-RKL.
arXiv Detail & Related papers (2025-01-28T00:29:38Z) - Symmetry verification for noisy quantum simulations of non-Abelian lattice gauge theories [0.0]
We discuss two techniques for error mitigation through symmetry verification, tailored for non-Abelian lattice gauge theories implemented in noisy qudit hardware.<n>Our results open new avenues for robust quantum simulation of non-Abelian gauge theories, for further development of error-mitigation techniques, and for measurement-based control methods in qudit platforms.
arXiv Detail & Related papers (2024-12-10T19:00:02Z) - Follow-the-Perturbed-Leader with Fr\'{e}chet-type Tail Distributions:
Optimality in Adversarial Bandits and Best-of-Both-Worlds [43.35179630620512]
We study the optimality of the Follow-the-Perturbed-Leader ( FTPL) policy in both adversarial and $K$-armed bandits.
We establish a sufficient condition for perturbations to achieve $mathcalO(sqrtKT)$ regrets in the adversarial setting.
arXiv Detail & Related papers (2024-03-08T08:07:26Z) - On Batch Normalisation for Approximate Bayesian Inference [102.94525205971873]
We show that batch-normalisation does not affect the optimum of the evidence lower bound (ELBO)
We also study the Monte Carlo Batch Normalisation (MCBN) algorithm, proposed as an approximate inference technique parallel to MC Dropout.
arXiv Detail & Related papers (2020-12-24T12:40:11Z) - Lower bounds in multiple testing: A framework based on derandomized
proxies [107.69746750639584]
This paper introduces an analysis strategy based on derandomization, illustrated by applications to various concrete models.
We provide numerical simulations of some of these lower bounds, and show a close relation to the actual performance of the Benjamini-Hochberg (BH) algorithm.
arXiv Detail & Related papers (2020-05-07T19:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.