Related papers: Nonmonotone subgradient methods based on a local descent lemma

Nonmonotone subgradient methods based on a local descent lemma

URL: http://arxiv.org/abs/2510.19341v1
Date: Wed, 22 Oct 2025 08:06:40 GMT
Title: Nonmonotone subgradient methods based on a local descent lemma
Authors: Francisco J. Aragón-Artacho, Rubén Campoy, Pedro Pérez-Aros, David Torregrosa-Belén,
Abstract summary: We extend the context of nonmonotone descent methods to the class of nonsmooth and non functions called upper-$mathcalC2$Newton.<n>Under assumption, we propose a general subgradient method that performs a nonmonotone linesearch.<n>In addition, we propose a specification of the general scheme, named Self-adaptive Nonmonotone Subgradient SNS (SNSM), which automatically updates the parameters of the linesearch.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The aim of this paper is to extend the context of nonmonotone descent methods to the class of nonsmooth and nonconvex functions called upper-$\mathcal{C}^2$, which satisfy a nonsmooth and local version of the descent lemma. Under this assumption, we propose a general subgradient method that performs a nonmonotone linesearch, and we prove subsequential convergence to a stationary point of the optimization problem. Our approach allows us to cover the setting of various subgradient algorithms, including Newton and quasi-Newton methods. In addition, we propose a specification of the general scheme, named Self-adaptive Nonmonotone Subgradient Method (SNSM), which automatically updates the parameters of the linesearch. Particular attention is paid to the minimum sum-of-squares clustering problem, for which we provide a concrete implementation of SNSM. We conclude with some numerical experiments where we exhibit the advantages of SNSM in comparison with some known algorithms.

Related papers

Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization [0.5120567378386615]
First-order descent and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning.<n>However, these methods do not exploit curvature information.<n>Quasi-Newton methods re-use previously computed low Hessian approximations.
arXiv Detail & Related papers (2025-02-17T20:20:11Z)
Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1. This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z)
Semi-Supervised Laplace Learning on Stiefel Manifolds [48.3427853588646]
We develop the framework Sequential Subspace for graph-based, supervised samples at low-label rates. We achieves that our methods at extremely low rates, and high label rates.
arXiv Detail & Related papers (2023-07-31T20:19:36Z)
Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods [37.1630298053787]
We propose a new framework, which we call the helper framework.<n>It provides a unified view of the variance and second-order algorithms equipped with global complexity guarantees.
arXiv Detail & Related papers (2023-02-23T12:18:28Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
SCORE: Approximating Curvature Information under Self-Concordant Regularization [0.0]
We propose a generalized Gauss-Newton with Self-Concordant Regularization (GGN-SCORE) algorithm that updates the minimization speed each time it receives a new input. The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing computational overhead.
arXiv Detail & Related papers (2021-12-14T13:03:04Z)
Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization [1.7513645771137178]
We consider unconstrained optimization problems with no available gradient information. We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a simulation function using finite differences within a common random number framework. We develop modified versions of a norm test and an inner product quasi-Newton test to control the sample sizes used in the approximations and provide global convergence results to the neighborhood of the optimal solution.
arXiv Detail & Related papers (2021-09-24T21:49:25Z)
Sinkhorn Natural Gradient for Generative Models [125.89871274202439]
We propose a novel Sinkhorn Natural Gradient (SiNG) algorithm which acts as a steepest descent method on the probability space endowed with the Sinkhorn divergence. We show that the Sinkhorn information matrix (SIM), a key component of SiNG, has an explicit expression and can be evaluated accurately in complexity that scales logarithmically. In our experiments, we quantitatively compare SiNG with state-of-the-art SGD-type solvers on generative tasks to demonstrate its efficiency and efficacy of our method.
arXiv Detail & Related papers (2020-11-09T02:51:17Z)
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point. The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
On Distributed Non-convex Optimization: Projected Subgradient Method For Weakly Convex Problems in Networks [13.385373310554327]
The Moreau subgradient method converges linear sharpness problems in machine learning. A distributed implementation of the subgradient method with a theoretical guarantee is proposed.
arXiv Detail & Related papers (2020-04-28T01:01:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.