Understanding Nesterov's Acceleration via Proximal Point Method
- URL: http://arxiv.org/abs/2005.08304v3
- Date: Thu, 2 Jun 2022 13:16:54 GMT
- Title: Understanding Nesterov's Acceleration via Proximal Point Method
- Authors: Kwangjun Ahn, Suvrit Sra
- Abstract summary: The proximal point method (PPM) is often used as a building block for designing optimization algorithms.
In this work, we use the PPM method to provide conceptually simple derivations along with convergence analyses of different versions of Nesterov's accelerated gradient method (AGM)
- Score: 52.99237600875452
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proximal point method (PPM) is a fundamental method in optimization that
is often used as a building block for designing optimization algorithms. In
this work, we use the PPM method to provide conceptually simple derivations
along with convergence analyses of different versions of Nesterov's accelerated
gradient method (AGM). The key observation is that AGM is a simple
approximation of PPM, which results in an elementary derivation of the update
equations and stepsizes of AGM. This view also leads to a transparent and
conceptually simple analysis of AGM's convergence by using the analysis of PPM.
The derivations also naturally extend to the strongly convex case. Ultimately,
the results presented in this paper are of both didactic and conceptual value;
they unify and explain existing variants of AGM while motivating other
accelerated methods for practically relevant settings.
Related papers
- A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Min-Max Optimization Made Simple: Approximating the Proximal Point
Method via Contraction Maps [77.8999425439444]
We present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems.
Our work is based on the fact that the update rule of the Proximal Point method can be approximated up to accuracy.
arXiv Detail & Related papers (2023-01-10T12:18:47Z) - A Discrete Variational Derivation of Accelerated Methods in Optimization [68.8204255655161]
We introduce variational which allow us to derive different methods for optimization.
We derive two families of optimization methods in one-to-one correspondence.
The preservation of symplecticity of autonomous systems occurs here solely on the fibers.
arXiv Detail & Related papers (2021-06-04T20:21:53Z) - Acceleration Methods [57.202881673406324]
We first use quadratic optimization problems to introduce two key families of acceleration methods.
We discuss momentum methods in detail, starting with the seminal work of Nesterov.
We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates.
arXiv Detail & Related papers (2021-01-23T17:58:25Z) - Efficient Consensus Model based on Proximal Gradient Method applied to
Convolutional Sparse Problems [2.335152769484957]
We derive and detail a theoretical analysis of an efficient consensus algorithm based on gradient proximal (PG) approach.
The proposed algorithm is also applied to another particular convolutional problem for the anomaly detection task.
arXiv Detail & Related papers (2020-11-19T20:52:48Z) - Majorization Minimization Methods for Distributed Pose Graph
Optimization with Convergence Guarantees [0.76146285961466]
We show that our proposed methods are guaranteed to converge to first-order critical points under mild conditions.
Since our proposed methods rely a proximal operator of distributed PGO, the convergence rate can be significantly accelerated.
The efficacy of this work is validated through applications on a number of 2D and 3D SLAM datasets.
arXiv Detail & Related papers (2020-03-11T15:18:33Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z) - Average-case Acceleration Through Spectral Density Estimation [35.01931431231649]
We develop a framework for the average-case analysis of random quadratic problems.
We derive algorithms that are optimal under this analysis.
We develop explicit algorithms for the uniform, Marchenko-Pastur, and exponential distributions.
arXiv Detail & Related papers (2020-02-12T01:44:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.