Online mirror descent and dual averaging: keeping pace in the dynamic
case
- URL: http://arxiv.org/abs/2006.02585v3
- Date: Fri, 3 Sep 2021 23:23:53 GMT
- Title: Online mirror descent and dual averaging: keeping pace in the dynamic
case
- Authors: Huang Fang, Nicholas J. A. Harvey, Victor S. Portella, Michael P.
Friedlander
- Abstract summary: Online mirror descent (OMD) and dual averaging (DA) are fundamental algorithms for online convex optimization.
We modify the OMD algorithm through a simple technique that we call stabilization.
We show that OMD with stabilization and DA enjoy the same performance guarantees in many applications -- even under dynamic learning rates.
- Score: 11.572321455920164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online mirror descent (OMD) and dual averaging (DA) -- two fundamental
algorithms for online convex optimization -- are known to have very similar
(and sometimes identical) performance guarantees when used with a fixed
learning rate. Under dynamic learning rates, however, OMD is provably inferior
to DA and suffers a linear regret, even in common settings such as prediction
with expert advice. We modify the OMD algorithm through a simple technique that
we call stabilization. We give essentially the same abstract regret bound for
OMD with stabilization and for DA by modifying the classical OMD convergence
analysis in a careful and modular way that allows for straightforward and
flexible proofs. Simple corollaries of these bounds show that OMD with
stabilization and DA enjoy the same performance guarantees in many applications
-- even under dynamic learning rates. We also shed light on the similarities
between OMD and DA and show simple conditions under which stabilized-OMD and DA
generate the same iterates.
Related papers
- Recursive Learning of Asymptotic Variational Objectives [49.69399307452126]
General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data.
Online sequential IWAE (OSIWAE) allows for online learning of both model parameters and a Markovian recognition model for inferring latent states.
This approach is more theoretically well-founded than recently proposed online variational SMC methods.
arXiv Detail & Related papers (2024-11-04T16:12:37Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback.
We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z) - The Generalization Error of Stochastic Mirror Descent on
Over-Parametrized Linear Models [37.6314945221565]
Deep networks are known to generalize well to unseen data.
Regularization properties ensure interpolating solutions with "good" properties are found.
We present simulation results that validate the theory and introduce two data models.
arXiv Detail & Related papers (2023-02-18T22:23:42Z) - Implicit Regularization Properties of Variance Reduced Stochastic Mirror
Descent [7.00422423634143]
We prove that the discrete VRSMD estimator sequence converges to the minimum mirror interpolant in the linear regression.
We derive a model estimation accuracy result in the setting when the true model is sparse.
arXiv Detail & Related papers (2022-04-29T19:37:24Z) - Second-Order Mirror Descent: Convergence in Games Beyond Averaging and
Discounting [1.2183405753834562]
We show that MD2 enjoys no-regret as well as an exponential rate of convergence towards strong VSS upon a slight modification.
MD2 can also be used to derive many novel continuous-time primal-space dynamics.
arXiv Detail & Related papers (2021-11-18T23:51:38Z) - Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration [63.15453821022452]
Recent developments in approaches based on deep learning have achieved sub-second runtimes for DiffIR.
We propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields.
We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields.
arXiv Detail & Related papers (2021-09-26T19:56:45Z) - Bagging, optimized dynamic mode decomposition (BOP-DMD) for robust,
stable forecasting with spatial and temporal uncertainty-quantification [2.741266294612776]
Dynamic mode decomposition (DMD) provides a framework for learning a best-fit linear dynamics model over snapshots of temporal, or-temporal, data.
The majority of DMD algorithms are prone to bias errors from noisy measurements of the dynamics, leading to poor model fits and unstable forecasting capabilities.
The optimized DMD algorithm minimizes the model bias with a variable projection optimization, thus leading to stabilized forecasting capabilities.
arXiv Detail & Related papers (2021-07-22T18:14:20Z) - On Centralized and Distributed Mirror Descent: Exponential Convergence
Analysis Using Quadratic Constraints [8.336315962271396]
Mirror descent (MD) is a powerful first-order optimization technique that subsumes several algorithms including gradient descent (GD)
We study the exact convergence rate of MD in both centralized and distributed cases for strongly convex and smooth problems.
arXiv Detail & Related papers (2021-05-29T23:05:56Z) - Dynamic Mode Decomposition in Adaptive Mesh Refinement and Coarsening
Simulations [58.720142291102135]
Dynamic Mode Decomposition (DMD) is a powerful data-driven method used to extract coherent schemes.
This paper proposes a strategy to enable DMD to extract from observations with different mesh topologies and dimensions.
arXiv Detail & Related papers (2021-04-28T22:14:25Z) - Simple and Effective Prevention of Mode Collapse in Deep One-Class
Classification [93.2334223970488]
We propose two regularizers to prevent hypersphere collapse in deep SVDD.
The first regularizer is based on injecting random noise via the standard cross-entropy loss.
The second regularizer penalizes the minibatch variance when it becomes too small.
arXiv Detail & Related papers (2020-01-24T03:44:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.