Dynamic Regret via Discounted-to-Dynamic Reduction with Applications to Curved Losses and Adam Optimizer
- URL: http://arxiv.org/abs/2602.08372v1
- Date: Mon, 09 Feb 2026 08:10:53 GMT
- Title: Dynamic Regret via Discounted-to-Dynamic Reduction with Applications to Curved Losses and Adam Optimizer
- Authors: Yan-Feng Xie, Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou,
- Abstract summary: We build discounted-to-dynamic dynamic regret minimization methods.<n>We focus on two representative curved losses: linear regression and logistic regression.<n>Our method yields new dynamic regret guarantees for online logistic regression.
- Score: 72.0797062226335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study dynamic regret minimization in non-stationary online learning, with a primary focus on follow-the-regularized-leader (FTRL) methods. FTRL is important for curved losses and for understanding adaptive optimizers such as Adam, yet existing dynamic regret analyses are less explored for FTRL. To address this, we build on the discounted-to-dynamic reduction and present a modular way to obtain dynamic regret bounds of FTRL-related problems. Specifically, we focus on two representative curved losses: linear regression and logistic regression. Our method not only simplifies existing proofs for the optimal dynamic regret of online linear regression, but also yields new dynamic regret guarantees for online logistic regression. Beyond online convex optimization, we apply the reduction to analyze the Adam optimizers, obtaining optimal convergence rates in stochastic, non-convex, and non-smooth settings. The reduction also enables a more detailed treatment of Adam with two discount parameters $(β_1,β_2)$, leading to new results for both clipped and clip-free variants of Adam optimizers.
Related papers
- Gradient Descent as a Perceptron Algorithm: Understanding Dynamics and Implicit Acceleration [67.12978375116599]
We show that the steps of gradient descent (GD) reduce to those of generalized perceptron algorithms.<n>This helps explain the optimization dynamics and the implicit acceleration phenomenon observed in neural networks.
arXiv Detail & Related papers (2025-12-12T14:16:35Z) - Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning [29.513882808306406]
We introduce the Adaptive Replay Buffer (ARB), a novel approach that prioritizes data sampling based on a lightweight metric we call 'on-policyness'<n>ARB is designed to be learning-free and simple to implement, seamlessly integrating into existing Offline-to-Online Reinforcement Learning algorithms.<n>Our experiments on D4RL benchmarks demonstrate that ARB consistently mitigates early performance degradation and significantly improves the final performance of various O2O RL algorithms.
arXiv Detail & Related papers (2025-12-11T10:30:04Z) - Online AUC Optimization Based on Second-order Surrogate Loss [3.9269332672496424]
The Area Under the Curve is an important performance metric for classification tasks.<n>We develop a novel second-order surrogate loss based on the pairwise hinge loss, and an efficient online storage.<n>Our approach introduces a new paradigm that directly substitutes the entire aggregated pairwise loss with a loss function constructed from the first- and second-order statistics.
arXiv Detail & Related papers (2025-10-24T07:08:22Z) - A Simplified Analysis of SGD for Linear Regression with Weight Averaging [64.2393952273612]
Recent work bycitetzou 2021benign provides sharp rates for SGD optimization in linear regression using constant learning rate.<n>We provide a simplified analysis recovering the same bias and variance bounds provided incitepzou 2021benign based on simple linear algebra tools.<n>We believe our work makes the analysis of gradient descent on linear regression very accessible and will be helpful in further analyzing mini-batching and learning rate scheduling.
arXiv Detail & Related papers (2025-06-18T15:10:38Z) - On the Dynamic Regret of Following the Regularized Leader: Optimism with History Pruning [8.708728702853966]
Follow the Regularized Leader (FTRL) is a framework for Online Convex Optimization (OCO)<n>Prior work has highlighted the framework's limitations in dynamic environments due to its tendency to produce "lazy" iterates.<n>We show that FTRL can indeed recover known dynamic regret bounds through optimistic composition of future costs.
arXiv Detail & Related papers (2025-05-28T22:03:15Z) - Accelerating RL for LLM Reasoning with Optimal Advantage Regression [52.0792918455501]
We propose a novel two-stage policy optimization framework that directly approximates the optimal advantage function.<n>$A$*-PO achieves competitive performance across a wide range of mathematical reasoning benchmarks.<n>It reduces training time by up to 2$times$ and peak memory usage by over 30% compared to PPO, GRPO, and REBEL.
arXiv Detail & Related papers (2025-05-27T03:58:50Z) - Flow-GRPO: Training Flow Matching Models via Online RL [80.62659379624867]
We propose Flow-GRPO, the first method to integrate online policy reinforcement learning into flow matching models.<n>Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Differential Equation into an equivalent Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps; and (2) a Denoising Reduction strategy that reduces training denoising steps while retaining the original number of inference steps.
arXiv Detail & Related papers (2025-05-08T17:58:45Z) - Dynamic Regret for Strongly Adaptive Methods and Optimality of Online
KRR [13.165557713537389]
We show that Strongly Adaptive (SA) algorithms can be viewed as a principled way of controlling dynamic regret.
We derive a new lower bound on a certain penalized regret which establishes the near minimax optimality of online Kernel Ridge Regression (KRR)
arXiv Detail & Related papers (2021-11-22T21:52:47Z) - LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.