Generalized Implicit Follow-The-Regularized-Leader
- URL: http://arxiv.org/abs/2306.00201v1
- Date: Wed, 31 May 2023 21:39:52 GMT
- Title: Generalized Implicit Follow-The-Regularized-Leader
- Authors: Keyi Chen and Francesco Orabona
- Abstract summary: Generalized implicit FTRL can recover known algorithms, as FTRL with linearized losses and implicit FTRL.
We show the flexibility of the framework by proving that some known algorithms, like the Mirror-Prox updates, are instantiations of the generalized implicit FTRL.
- Score: 15.974402990630402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new class of online learning algorithms, generalized implicit
Follow-The-Regularized-Leader (FTRL), that expands the scope of FTRL framework.
Generalized implicit FTRL can recover known algorithms, as FTRL with linearized
losses and implicit FTRL, and it allows the design of new update rules, as
extensions of aProx and Mirror-Prox to FTRL. Our theory is constructive in the
sense that it provides a simple unifying framework to design updates that
directly improve the worst-case upper bound on the regret. The key idea is
substituting the linearization of the losses with a Fenchel-Young inequality.
We show the flexibility of the framework by proving that some known algorithms,
like the Mirror-Prox updates, are instantiations of the generalized implicit
FTRL. Finally, the new framework allows us to recover the temporal variation
bound of implicit OMD, with the same computational complexity.
Related papers
- The Power of Perturbation under Sampling in Solving Extensive-Form Games [56.013335390600524]
This paper investigates how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in imperfect-information extensive-form games.
Perturbing the expected payoffs guarantees that the FTRL dynamics reach an approximate equilibrium.
We show that in the last-iterate sense, the FTRL consistently outperforms the non-samplinged FTRL.
arXiv Detail & Related papers (2025-01-28T00:29:38Z) - Computing Optimal Regularizers for Online Linear Optimization [38.72709491927979]
Follow-the-Regularized-Leader (FTRL) algorithms are a popular class of learning algorithms for online linear optimization (OLO)
We show that there exists an instantiation of FTRL which achieves regret within a constant factor of the best possible learning algorithm.
arXiv Detail & Related papers (2024-10-22T18:10:50Z) - Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits [6.075593833879357]
Follow-The-Regularized-Leader (FTRL) algorithms often enjoy optimal regret for adversarial as well as bandit problems.
We propose a new FTPL algorithm that generates optimal policies for both adversarial and multi-armed bandits.
arXiv Detail & Related papers (2024-09-30T16:00:23Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - More Benefits of Being Distributional: Second-Order Bounds for
Reinforcement Learning [58.626683114119906]
We show that Distributional Reinforcement Learning (DistRL) can obtain second-order bounds in both online and offline RL.
Our results are the first second-order bounds for low-rank MDPs and for offline RL.
arXiv Detail & Related papers (2024-02-11T13:25:53Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - A Minimalist Approach to Offline Reinforcement Learning [10.904148149681932]
offline reinforcement learning defines the task of learning from a fixed batch of data.
In this paper we aim to make a deep RL algorithm work while making minimal changes.
We find that we can match the performance of state-of-the-art offline RL algorithms by simply adding a behavior cloning term to the policy update of an online RL algorithm.
arXiv Detail & Related papers (2021-06-12T20:38:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.