Subgradient methods near active manifolds: saddle point avoidance, local
convergence, and asymptotic normality
- URL: http://arxiv.org/abs/2108.11832v1
- Date: Thu, 26 Aug 2021 15:02:16 GMT
- Title: Subgradient methods near active manifolds: saddle point avoidance, local
convergence, and asymptotic normality
- Authors: Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang
- Abstract summary: We show that aiming and subgradient approximation fully expose the smooth substructure of the problem.
We prove these properties hold for a wide class of problems, including cone reducible/decomposable functions and generic semialgebraic problems.
The normality results appear to be new even in the most classical setting.
- Score: 4.709588811674973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nonsmooth optimization problems arising in practice tend to exhibit
beneficial smooth substructure: their domains stratify into "active manifolds"
of smooth variation, which common proximal algorithms "identify" in finite
time. Identification then entails a transition to smooth dynamics, and
accommodates second-order acceleration techniques. While identification is
clearly useful algorithmically, empirical evidence suggests that even those
algorithms that do not identify the active manifold in finite time -- notably
the subgradient method -- are nonetheless affected by it. This work seeks to
explain this phenomenon, asking: how do active manifolds impact the subgradient
method in nonsmooth optimization?
In this work, we answer this question by introducing two algorithmically
useful properties -- aiming and subgradient approximation -- that fully expose
the smooth substructure of the problem. We show that these properties imply
that the shadow of the (stochastic) subgradient method along the active
manifold is precisely an inexact Riemannian gradient method with an implicit
retraction. We prove that these properties hold for a wide class of problems,
including cone reducible/decomposable functions and generic semialgebraic
problems. Moreover, we develop a thorough calculus, proving such properties are
preserved under smooth deformations and spectral lifts. This viewpoint then
leads to several algorithmic consequences that parallel results in smooth
optimization, despite the nonsmoothness of the problem: local rates of
convergence, asymptotic normality, and saddle point avoidance. The asymptotic
normality results appear to be new even in the most classical setting of
stochastic nonlinear programming. The results culminate in the following
observation: the perturbed subgradient method on generic, Clarke regular
semialgebraic problems, converges only to local minimizers.
Related papers
- Anderson Acceleration in Nonsmooth Problems: Local Convergence via Active Manifold Identification [11.495346367359037]
We investigate a class of nonsmooth optimization algorithms characterized by the active manifold identification property.
Under the assumption that the optimization problem possesses an active manifold at a stationary point, we establish a local R-linear convergence rate for the Anderson-accelerated algorithm.
arXiv Detail & Related papers (2024-10-12T07:55:06Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - Variance reduction techniques for stochastic proximal point algorithms [5.374800961359305]
In this work, we propose the first unified study of variance reduction techniques for proximal point algorithms.
We introduce a generic proximal-based algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants.
Our experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts.
arXiv Detail & Related papers (2023-08-18T05:11:50Z) - Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty.
Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z) - Learning Globally Smooth Functions on Manifolds [94.22412028413102]
Learning smooth functions is generally challenging, except in simple cases such as learning linear or kernel models.
This work proposes to overcome these obstacles by combining techniques from semi-infinite constrained learning and manifold regularization.
We prove that, under mild conditions, this method estimates the Lipschitz constant of the solution, learning a globally smooth solution as a byproduct.
arXiv Detail & Related papers (2022-10-01T15:45:35Z) - Gradient flows and randomised thresholding: sparse inversion and
classification [0.0]
Sparse inversion and classification problems are ubiquitous in modern data science and imaging.
In classification, we consider, e.g., the sum of a data fidelity term and a non-smooth Ginzburg--Landau energy.
Standard (sub)gradient descent methods have shown to be inefficient when approaching such problems.
arXiv Detail & Related papers (2022-03-22T09:21:14Z) - Stability vs Implicit Bias of Gradient Methods on Separable Data and
Beyond [33.593203156666746]
We focus on the generalization properties of unregularized gradient-based learning procedures applied to separable linear classification.
We give an additional unified explanation for this generalization, that we refer to as realizability and self-boundedness.
In some of these cases, our bounds significantly improve upon the existing generalization error bounds in the literature.
arXiv Detail & Related papers (2022-02-27T19:56:36Z) - A Discrete Variational Derivation of Accelerated Methods in Optimization [68.8204255655161]
We introduce variational which allow us to derive different methods for optimization.
We derive two families of optimization methods in one-to-one correspondence.
The preservation of symplecticity of autonomous systems occurs here solely on the fibers.
arXiv Detail & Related papers (2021-06-04T20:21:53Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic
Optimization Problems [12.010310883787911]
We analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) optimization problems.
Experimental results indicate how the proposed algorithms empirically outperform its zerothorder gradient descent and its design variant.
arXiv Detail & Related papers (2020-05-19T07:44:52Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.