Convergence and Stability of the Stochastic Proximal Point Algorithm
with Momentum
- URL: http://arxiv.org/abs/2111.06171v5
- Date: Tue, 27 Jun 2023 02:29:46 GMT
- Title: Convergence and Stability of the Stochastic Proximal Point Algorithm
with Momentum
- Authors: Junhyung Lyle Kim, Panos Toulis, Anastasios Kyrillidis
- Abstract summary: We show how a gradient proximal algorithm with momentum (PPA) allows faster convergence to a neighborhood compared to the proximal algorithm (PPA) with better contraction factor.
- Score: 14.158845925610438
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in
many optimization scenarios, including convex optimization instances and
non-convex neural network training. Yet, in the stochastic setting, momentum
interferes with gradient noise, often leading to specific step size and
momentum choices in order to guarantee convergence, set aside acceleration.
Proximal point methods, on the other hand, have gained much attention due to
their numerical stability and elasticity against imperfect tuning. Their
stochastic accelerated variants though have received limited attention: how
momentum interacts with the stability of (stochastic) proximal point methods
remains largely unstudied. To address this, we focus on the convergence and
stability of the stochastic proximal point algorithm with momentum (SPPAM), and
show that SPPAM allows a faster linear convergence to a neighborhood compared
to the stochastic proximal point algorithm (SPPA) with a better contraction
factor, under proper hyperparameter tuning. In terms of stability, we show that
SPPAM depends on problem constants more favorably than SGDM, allowing a wider
range of step size and momentum that lead to convergence.
Related papers
- A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise [17.493808856903303]
One fundamental challenge in analyzing an approximation algorithm is to establish its stability.
In this paper, we extend the celebrated Borkar-Meyn theorem for stability bounded from the Martingale difference noise setting to the Markovian noise setting.
Central to our analysis is the diminishing rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lynov drift condition.
arXiv Detail & Related papers (2024-01-15T17:20:17Z) - Convergence rates for momentum stochastic gradient descent with noise of
machine learning type [1.4213973379473654]
We consider the momentum of descent scheme (MSGD) and its continuous-in-time counterpart.
We show almost exponential convergence of objective function value for target functions.
arXiv Detail & Related papers (2023-02-07T15:59:08Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Losing momentum in continuous-time stochastic optimisation [62.997667081978825]
momentum-based algorithms have become especially popular in recent years.
In work, we propose and analyse a continuous-time model for gradient descent with momentum.
We show convergence of our system to the global minimiser when reducing momentum over time.
arXiv Detail & Related papers (2022-09-08T10:46:05Z) - Last-iterate convergence analysis of stochastic momentum methods for
neural networks [3.57214198937538]
The momentum method is used to solve large-scale optimization problems in neural networks.
Current convergence results of momentum methods under artificial settings.
The momentum factors can be fixed to be constant, rather than in existing time.
arXiv Detail & Related papers (2022-05-30T02:17:44Z) - A Semismooth Newton Stochastic Proximal Point Algorithm with Variance Reduction [2.048226951354646]
We develop an implementable proximal point (SPP) method for a class of weakly convex, composite optimization problems.
The proposed algorithm incorporates a variance reduction mechanism and the resulting updates are solved using an inexact semismooth Newton framework.
arXiv Detail & Related papers (2022-04-01T13:08:49Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Stochastic optimization with momentum: convergence, fluctuations, and
traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method.
The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z) - Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic
Perspectives [97.16266088683061]
The article rigorously establishes why symplectic discretization schemes are important for momentum-based optimization algorithms.
It provides a characterization of algorithms that exhibit accelerated convergence.
arXiv Detail & Related papers (2020-02-28T00:32:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.