Related papers: A Derivation of Nesterov's Accelerated Gradient Algorithm from Optimal Control Theory

A Derivation of Nesterov's Accelerated Gradient Algorithm from Optimal Control Theory

URL: http://arxiv.org/abs/2203.17226v1
Date: Tue, 29 Mar 2022 19:26:20 GMT
Title: A Derivation of Nesterov's Accelerated Gradient Algorithm from Optimal Control Theory
Authors: I. M. Ross
Abstract summary: Nesterov's accelerated gradient algorithm is derived from first principles. An Euler discretization of the resulting differential equation produces Nesterov's algorithm.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Nesterov's accelerated gradient algorithm is derived from first principles. The first principles are founded on the recently-developed optimal control theory for optimization. This theory frames an optimization problem as an optimal control problem whose trajectories generate various continuous-time algorithms. The algorithmic trajectories satisfy the necessary conditions for optimal control. The necessary conditions produce a controllable dynamical system for accelerated optimization. Stabilizing this system via a quadratic control Lyapunov function generates an ordinary differential equation. An Euler discretization of the resulting differential equation produces Nesterov's algorithm. In this context, this result solves the purported mystery surrounding the algorithm.

Related papers

Deterministic Trajectory Optimization through Probabilistic Optimal Control [3.2771631221674333]
We propose two new algorithms for discrete-time deterministic finite-horizon nonlinear optimal control problems. Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control. We show that the application of this algorithm results in a fixed point of probabilistic policies that converge to the deterministic optimal policy.
arXiv Detail & Related papers (2024-07-18T09:17:47Z)
Analyzing and Enhancing the Backward-Pass Convergence of Unrolled Optimization [50.38518771642365]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form. This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is equivalent to the solution of a linear system by a particular iterative method. A system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations.
arXiv Detail & Related papers (2023-12-28T23:15:18Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z)
An Accelerated Variance-Reduced Conditional Gradient Sliding Algorithm for First-order and Zeroth-order Optimization [111.24899593052851]
Conditional gradient algorithm (also known as the Frank-Wolfe algorithm) has recently regained popularity in the machine learning community. ARCS is the first zeroth-order conditional gradient sliding type algorithms solving convex problems in zeroth-order optimization. In first-order optimization, the convergence results of ARCS substantially outperform previous algorithms in terms of the number of gradient query oracle.
arXiv Detail & Related papers (2021-09-18T07:08:11Z)
A Contraction Theory Approach to Optimization Algorithms from Acceleration Flows [1.90365714903665]
We use contraction theory to provide a principled methodology to design and discretize appropriate ODEs. We propose a novel system of ODEs, namely the Accelerated-Contracting-Nesterov flow. Remarkably, a simple explicit Euler discretization of this flow corresponds to the Nesterov acceleration method.
arXiv Detail & Related papers (2021-05-18T21:11:37Z)
Optimizing Optimizers: Regret-optimal gradient descent algorithms [9.89901717499058]
We study the existence, uniqueness and consistency of regret-optimal algorithms. By providing first-order optimality conditions for the control problem, we show that regret-optimal algorithms must satisfy a specific structure in their dynamics. We present fast numerical methods for approximating them, generating optimization algorithms which directly optimize their long-term regret.
arXiv Detail & Related papers (2020-12-31T19:13:53Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Differentially Private Accelerated Optimization Algorithms [0.7874708385247353]
We present two classes of differentially private optimization algorithms. The first algorithm is inspired by Polyak's heavy ball method. The second class of algorithms are based on Nesterov's accelerated gradient method.
arXiv Detail & Related papers (2020-08-05T08:23:01Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.