Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth
Non-Convex Optimization
- URL: http://arxiv.org/abs/2002.05466v2
- Date: Thu, 11 Feb 2021 14:52:42 GMT
- Title: Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth
Non-Convex Optimization
- Authors: Vien V. Mai and Mikael Johansson
- Abstract summary: This paper establishes the convergence of the rate of a non-smooth subient method with momentum for constrained problems.
For problems, we show how the unconstrained case can be analyzed under weaker assumptions than the state-of-the-art.
- Score: 25.680334940504405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic gradient methods with momentum are widely used in applications and
at the core of optimization subroutines in many popular machine learning
libraries. However, their sample complexities have not been obtained for
problems beyond those that are convex or smooth. This paper establishes the
convergence rate of a stochastic subgradient method with a momentum term of
Polyak type for a broad class of non-smooth, non-convex, and constrained
optimization problems. Our key innovation is the construction of a special
Lyapunov function for which the proven complexity can be achieved without any
tuning of the momentum parameter. For smooth problems, we extend the known
complexity bound to the constrained case and demonstrate how the unconstrained
case can be analyzed under weaker assumptions than the state-of-the-art.
Numerical results confirm our theoretical developments.
Related papers
- Independently-Normalized SGD for Generalized-Smooth Nonconvex Optimization [19.000530691874516]
We show that many non machine learning problems meet that kind of condition that extends beyond traditional non-smoothepseps.
We propose an independently-normalized gradient descent algorithm, which leverages independent sampling and normalization.
arXiv Detail & Related papers (2024-10-17T21:52:00Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Non-Convex Stochastic Composite Optimization with Polyak Momentum [25.060477077577154]
The generalization proximal gradient is a powerful generalization of the widely used gradient descent (SGD) method.
However, it is notoriously known that method fails to converge in numerous applications in Machine.
arXiv Detail & Related papers (2024-03-05T13:43:58Z) - High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise [96.80184504268593]
gradient, clipping is one of the key algorithmic ingredients to derive good high-probability guarantees.
Clipping can spoil the convergence of the popular methods for composite and distributed optimization.
arXiv Detail & Related papers (2023-10-03T07:49:17Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Nonsmooth Implicit Differentiation for Machine Learning and Optimization [0.0]
In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus.
Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled.
This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified.
arXiv Detail & Related papers (2021-06-08T13:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.