On the Dynamic Regret of Following the Regularized Leader: Optimism with History Pruning
- URL: http://arxiv.org/abs/2505.22899v1
- Date: Wed, 28 May 2025 22:03:15 GMT
- Title: On the Dynamic Regret of Following the Regularized Leader: Optimism with History Pruning
- Authors: Naram Mhaisen, George Iosifidis,
- Abstract summary: Follow the Regularized Leader (FTRL) is a framework for Online Convex Optimization (OCO)<n>Prior work has highlighted the framework's limitations in dynamic environments due to its tendency to produce "lazy" iterates.<n>We show that FTRL can indeed recover known dynamic regret bounds through optimistic composition of future costs and careful linearization of past costs.
- Score: 10.25772015681554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We revisit the Follow the Regularized Leader (FTRL) framework for Online Convex Optimization (OCO) over compact sets, focusing on achieving dynamic regret guarantees. Prior work has highlighted the framework's limitations in dynamic environments due to its tendency to produce "lazy" iterates. However, building on insights showing FTRL's ability to produce "agile" iterates, we show that it can indeed recover known dynamic regret bounds through optimistic composition of future costs and careful linearization of past costs, which can lead to pruning some of them. This new analysis of FTRL against dynamic comparators yields a principled way to interpolate between greedy and agile updates and offers several benefits, including refined control over regret terms, optimism without cyclic dependence, and the application of minimal recursive regularization akin to AdaFTRL. More broadly, we show that it is not the lazy projection style of FTRL that hinders (optimistic) dynamic regret, but the decoupling of the algorithm's state (linearized history) from its iterates, allowing the state to grow arbitrarily. Instead, pruning synchronizes these two when necessary.
Related papers
- The Power of Perturbation under Sampling in Solving Extensive-Form Games [56.013335390600524]
This paper investigates how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in imperfect-information extensive-form games.<n>Perturbing the expected payoffs guarantees that the FTRL dynamics reach an approximate equilibrium.<n>We show that in the last-iterate sense, the FTRL consistently outperforms the non-samplinged FTRL.
arXiv Detail & Related papers (2025-01-28T00:29:38Z) - Computing Optimal Regularizers for Online Linear Optimization [38.72709491927979]
Follow-the-Regularized-Leader (FTRL) algorithms are a popular class of learning algorithms for online linear optimization (OLO)
We show that there exists an instantiation of FTRL which achieves regret within a constant factor of the best possible learning algorithm.
arXiv Detail & Related papers (2024-10-22T18:10:50Z) - Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL)
QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM)
Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.<n>In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.<n>We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Generalized Implicit Follow-The-Regularized-Leader [15.974402990630402]
Generalized implicit FTRL can recover known algorithms, as FTRL with linearized losses and implicit FTRL.
We show the flexibility of the framework by proving that some known algorithms, like the Mirror-Prox updates, are instantiations of the generalized implicit FTRL.
arXiv Detail & Related papers (2023-05-31T21:39:52Z) - Smoothed Online Convex Optimization Based on Discounted-Normal-Predictor [68.17855675511602]
We investigate an online prediction strategy named as Discounted-Normal-Predictor (Kapralov and Panigrahy, 2010) for smoothed online convex optimization (SOCO)
We show that the proposed algorithm can minimize the adaptive regret with switching cost in every interval.
arXiv Detail & Related papers (2022-05-02T08:48:22Z) - Dynamic Regret for Strongly Adaptive Methods and Optimality of Online
KRR [13.165557713537389]
We show that Strongly Adaptive (SA) algorithms can be viewed as a principled way of controlling dynamic regret.
We derive a new lower bound on a certain penalized regret which establishes the near minimax optimality of online Kernel Ridge Regression (KRR)
arXiv Detail & Related papers (2021-11-22T21:52:47Z) - Delay-Tolerant Constrained OCO with Application to Network Resource
Allocation [44.67787270821051]
We consider online convex optimization (OCO) with multi-slot feedback delay.
An agent makes a sequence of online decisions to minimize the accumulation of time-varying convex loss functions.
We propose Delay-Tolerant Constrained-OCO, which uses a novel constraint penalty with double regularization to tackle the asynchrony between information feedback and decision updates.
arXiv Detail & Related papers (2021-05-09T19:32:33Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control.
From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly.
We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z) - Dynamic Regret of Policy Optimization in Non-stationary Environments [120.01408308460095]
We propose two model-free policy optimization algorithms, POWER and POWER++, and establish guarantees for their dynamic regret.
We show that POWER++ improves over POWER on the second component of the dynamic regret by actively adapting to non-stationarity through prediction.
To the best of our knowledge, our work is the first dynamic regret analysis of model-free RL algorithms in non-stationary environments.
arXiv Detail & Related papers (2020-06-30T23:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.