Related papers: Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds

Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds

URL: http://arxiv.org/abs/2305.17301v2
Date: Tue, 13 Feb 2024 09:23:48 GMT
Title: Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds
Authors: Taira Tsuchiya, Shinji Ito, Junya Honda
Abstract summary: Follow-the-regularized-leader (FTRL) has recently emerged as one of the most promising approaches for obtaining various types of adaptivity in bandit problems. We establish several algorithms with three types of adaptivity: sparsity, game-dependency, and best-of-both-worlds (BOBW)
Score: 46.30750729936261
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adaptivity to the difficulties of a problem is a key property in sequential decision-making problems to broaden the applicability of algorithms. Follow-the-regularized-leader (FTRL) has recently emerged as one of the most promising approaches for obtaining various types of adaptivity in bandit problems. Aiming to further generalize this adaptivity, we develop a generic adaptive learning rate, called stability-penalty-adaptive (SPA) learning rate for FTRL. This learning rate yields a regret bound jointly depending on stability and penalty of the algorithm, into which the regret of FTRL is typically decomposed. With this result, we establish several algorithms with three types of adaptivity: sparsity, game-dependency, and best-of-both-worlds (BOBW). Despite the fact that sparsity appears frequently in real problems, existing sparse multi-armed bandit algorithms with $k$-arms assume that the sparsity level $s \leq k$ is known in advance, which is often not the case in real-world scenarios. To address this issue, we first establish $s$-agnostic algorithms with regret bounds of $\tilde{O}(\sqrt{sT})$ in the adversarial regime for $T$ rounds, which matches the existing lower bound up to a logarithmic factor. Meanwhile, BOBW algorithms aim to achieve a near-optimal regret in both the stochastic and adversarial regimes. Leveraging the SPA learning rate and the technique for $s$-agnostic algorithms combined with a new analysis to bound the variation in FTRL output in response to changes in a regularizer, we establish the first BOBW algorithm with a sparsity-dependent bound. Additionally, we explore partial monitoring and demonstrate that the proposed SPA learning rate framework allows us to achieve a game-dependent bound and the BOBW simultaneously.

Related papers

Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
The Power of Perturbation under Sampling in Solving Extensive-Form Games [56.013335390600524]
This paper investigates how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in imperfect-information extensive-form games. Perturbing the expected payoffs guarantees that the FTRL dynamics reach an approximate equilibrium. We show that in the last-iterate sense, the FTRL consistently outperforms the non-samplinged FTRL.
arXiv Detail & Related papers (2025-01-28T00:29:38Z)
uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs [33.262918224598614]
We present a novel algorithm for the Heavy-Tailed Multi-Armed Bandits (HTMAB) problem, demonstrating robustness and adaptability. Our novel algorithm uni enjoys the so-called Best-of-Both-Worlds (BoBW) property, performing optimally in both and adversarial environments. To our knowledge, uniINF is the first parameter-free algorithm to achieve the BoBW property for the heavy-tailed MAB problem.
arXiv Detail & Related papers (2024-10-04T09:55:44Z)
Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits [6.7310264583128445]
Follow-The-Regularized-Leader (FTRL) algorithms often enjoy optimal regret for adversarial as well as bandit problems. We propose a new FTPL algorithm that generates optimal policies for both adversarial and multi-armed bandits.
arXiv Detail & Related papers (2024-09-30T16:00:23Z)
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem. It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling. We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z)
A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits [0.0]
In areas of high volatility like healthcare or finance, a naive reward approach often does not accurately capture the complexity of the learning problem. We propose a framework of adaptive risk-aware strategies that operate in non-stationary environments.
arXiv Detail & Related papers (2023-10-24T19:29:13Z)
Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm [11.80907528239756]
Linear bandit problem has been studied for many years in both adversarial settings. citetLeeLWZ021 propose an algorithm that actively detects the loss type and then switches between different algorithms specially designed for specific settings. Follow-the-regularized-leader (FTRL) is another type of popular algorithm that can adapt to different environments.
arXiv Detail & Related papers (2023-03-13T02:50:59Z)
Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms [41.06668954462585]
We study the problem of designing adaptive multi-armed bandit algorithms that optimally perform in both the setting and the adversarial setting simultaneously. We show that uniqueness is unnecessary for FTRL with a broad family of regularizers and a new learning rate schedule.
arXiv Detail & Related papers (2023-02-27T06:09:10Z)
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise. Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret. We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z)
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond [33.13041034490332]
Best-of-both-worlds algorithms for online learning achieve near-optimal regret in both the adversarial and the adversarial regimes. We present a general reduction from best of both worlds to a wide family of follow-the-regularized-leader (FTRL) and online-mirrordescent (OMD) algorithms.
arXiv Detail & Related papers (2023-02-20T03:42:31Z)
Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections. We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer. The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z)
Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.