Characterizing Parametric and Convergence Stability in Nonconvex and
Nonsmooth Optimizations: A Geometric Approach
- URL: http://arxiv.org/abs/2204.01643v1
- Date: Mon, 4 Apr 2022 16:46:19 GMT
- Title: Characterizing Parametric and Convergence Stability in Nonconvex and
Nonsmooth Optimizations: A Geometric Approach
- Authors: Xiaotie Deng, Hanyu Li, Ningyuan Li
- Abstract summary: We consider stability issues in minimizing a continuous parameterized nonsmooth function pointf$f$.
We show two notions of stationary stability: parametric stability and partitioning stability.
We show that these notions have deep connections to geometry theory.
- Score: 6.605210585717247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider stability issues in minimizing a continuous (probably
parameterized, nonconvex and nonsmooth) real-valued function $f$. We call a
point stationary if all its possible directional derivatives are nonnegative.
In this work, we focus on two notions of stability on stationary points of $f$:
parametric stability and convergence stability. Parametric considerations are
widely studied in various fields, including smoothed analysis, numerical
stability, condition numbers and sensitivity analysis for linear programming.
Parametric stability asks whether minor perturbations on parameters lead to
dramatic changes in the position and $f$ value of a stationary point.
Meanwhile, convergence stability indicates a non-escapable solution: Any point
sequence iteratively produced by an optimization algorithm cannot escape from a
neighborhood of a stationary point but gets close to it in the sense that such
stationary points are stable to the precision parameter and algorithmic
numerical errors. It turns out that these notions have deep connections to
geometry theory. We show that parametric stability is linked to deformations of
graphs of functions. On the other hand, convergence stability is concerned with
area partitioning of the function domain. Utilizing these connections, we prove
quite tight conditions of these two stability notions for a wide range of
functions and optimization algorithms with small enough step sizes and
precision parameters. These conditions are subtle in the sense that a slightly
weaker function requirement goes to the opposite of primitive intuitions and
leads to wrong conclusions. We present three applications of this theory. These
applications reveal some understanding on Nash equilibrium computation,
nonconvex and nonsmooth optimization, as well as the new optimization
methodology of deep neural networks.
Related papers
- The Price of Robustness: Stable Classifiers Need Overparameterization [17.335490896384265]
We establish a generalization bound for finite function classes that improves inversely with class stability.<n>We derive as a corollary a law of robustness for classification that extends the results of Bubeck and Sellke.<n>Experiments support our theory, while traditional norm-based measures remain largely uninformative.
arXiv Detail & Related papers (2026-03-03T09:47:06Z) - On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials [33.36228161076605]
The stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms.<n>Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays.<n>We show that nonlinear dynamics can diverge in expectation even if a single batch is unstable.
arXiv Detail & Related papers (2026-02-16T14:36:55Z) - ADMM for Structured Fractional Minimization [7.9047096855236125]
We consider a class of structured fractional problems, where the numerator includes a differentiable function.
We introduce sf FADMM, the first Alternating Method of Multipliers for this class of problems.
arXiv Detail & Related papers (2024-11-12T02:50:12Z) - The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss.
We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z) - Exact Mean Square Linear Stability Analysis for SGD [28.65663421598186]
We provide an explicit condition on the step size that is both necessary and sufficient for linear stability of gradient descent (SGD)
We show that SGD's stability threshold is equivalent to that of a mixture process which takes in each a full batch gradient step w.p. $1-p$, and a single sample gradient step w.p. $p$.
arXiv Detail & Related papers (2023-06-13T15:29:23Z) - Decentralized Weakly Convex Optimization Over the Stiefel Manifold [28.427697270742947]
We focus on the Stiefel manifold in the decentralized setting, where a connected network of agents in $nMn log-1)$ are tested.
We propose an method called the decentralized subgradient method (DRSM)$ for forcing a natural station below $nMn log-1)$.
arXiv Detail & Related papers (2023-03-31T02:56:23Z) - Numerically Stable Sparse Gaussian Processes via Minimum Separation
using Cover Trees [57.67528738886731]
We study the numerical stability of scalable sparse approximations based on inducing points.
For low-dimensional tasks such as geospatial modeling, we propose an automated method for computing inducing points satisfying these conditions.
arXiv Detail & Related papers (2022-10-14T15:20:17Z) - Proximal Gradient Descent-Ascent: Variable Convergence under K{\L}
Geometry [49.65455534654459]
The finite descent-ascent parameters (GDA) has been widely applied to solve minimax optimization problems.
This paper fills such a gap by studying the convergence of the KL-Lized geometry.
arXiv Detail & Related papers (2021-02-09T05:35:53Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z) - Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.