Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic
Perspectives
- URL: http://arxiv.org/abs/2002.12493v2
- Date: Mon, 12 Apr 2021 07:03:56 GMT
- Title: Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic
Perspectives
- Authors: Michael Muehlebach and Michael I. Jordan
- Abstract summary: The article rigorously establishes why symplectic discretization schemes are important for momentum-based optimization algorithms.
It provides a characterization of algorithms that exhibit accelerated convergence.
- Score: 97.16266088683061
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze the convergence rate of various momentum-based optimization
algorithms from a dynamical systems point of view. Our analysis exploits
fundamental topological properties, such as the continuous dependence of
iterates on their initial conditions, to provide a simple characterization of
convergence rates. In many cases, closed-form expressions are obtained that
relate algorithm parameters to the convergence rate. The analysis encompasses
discrete time and continuous time, as well as time-invariant and time-variant
formulations, and is not limited to a convex or Euclidean setting. In addition,
the article rigorously establishes why symplectic discretization schemes are
important for momentum-based optimization algorithms, and provides a
characterization of algorithms that exhibit accelerated convergence.
Related papers
- Convergence of Expectation-Maximization Algorithm with Mixed-Integer Optimization [5.319361976450982]
This paper introduces a set of conditions that ensure the convergence of a specific class of EM algorithms.
Our results offer a new analysis technique for iterative algorithms that solve mixed-integer non-linear optimization problems.
arXiv Detail & Related papers (2024-01-31T11:42:46Z) - Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast
Evasion of Non-Degenerate Saddle Points [8.452349885923507]
Gradient-based first-order convex optimization algorithms find widespread applicability in a variety of domains, including machine learning tasks.
Motivated by the recent advances in fixed-time theory of optimal time, we introduce a framework for designing accelerated optimization algorithms.
For functions that admit non-de saddle-points, we show that the time required to evade these saddle-points is uniformly bounded for all initial conditions.
arXiv Detail & Related papers (2022-12-07T16:36:23Z) - Breaking the Convergence Barrier: Optimization via Fixed-Time Convergent
Flows [4.817429789586127]
We introduce a Poly-based optimization framework for achieving acceleration, based on the notion of fixed-time stability dynamical systems.
We validate the accelerated convergence properties of the proposed schemes on a range of numerical examples against the state-of-the-art optimization algorithms.
arXiv Detail & Related papers (2021-12-02T16:04:40Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - The Connection between Discrete- and Continuous-Time Descriptions of
Gaussian Continuous Processes [60.35125735474386]
We show that discretizations yielding consistent estimators have the property of invariance under coarse-graining'
This result explains why combining differencing schemes for derivatives reconstruction and local-in-time inference approaches does not work for time series analysis of second or higher order differential equations.
arXiv Detail & Related papers (2021-01-16T17:11:02Z) - Nonlinear Two-Time-Scale Stochastic Approximation: Convergence and
Finite-Time Performance [1.52292571922932]
We study the convergence and finite-time analysis of the nonlinear two-time-scale approximation.
In particular, we show that the method achieves a convergence in expectation at a rate $mathcalO (1/k2/3)$, where $k$ is the number of iterations.
arXiv Detail & Related papers (2020-11-03T17:43:39Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z) - Distributed Value Function Approximation for Collaborative Multi-Agent
Reinforcement Learning [2.7071541526963805]
We propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning.
The proposed algorithms differ by their form, definition of eligibility traces, selection of time scales and the way of incorporating consensus iterations.
It is demonstrated how the adopted methodology can be applied to temporal-difference algorithms under weaker information structure constraints.
arXiv Detail & Related papers (2020-06-18T11:46:09Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.