Survey Descent: A Multipoint Generalization of Gradient Descent for
Nonsmooth Optimization
- URL: http://arxiv.org/abs/2111.15645v1
- Date: Tue, 30 Nov 2021 18:28:17 GMT
- Title: Survey Descent: A Multipoint Generalization of Gradient Descent for
Nonsmooth Optimization
- Authors: X.Y. Han and Adrian S. Lewis
- Abstract summary: We propose a generalization of the gradient descent iteration for local optimization.
We prove linear convergence when the objective is itself max-of-smooth, and experiments suggest a more general phenomenon.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: For strongly convex objectives that are smooth, the classical theory of
gradient descent ensures linear convergence relative to the number of gradient
evaluations. An analogous nonsmooth theory is challenging: even when the
objective is smooth at every iterate, the corresponding local models are
unstable, and traditional remedies need unpredictably many cutting planes. We
instead propose a multipoint generalization of the gradient descent iteration
for local optimization. While designed with general objectives in mind, we are
motivated by a "max-of-smooth" model that captures subdifferential dimension at
optimality. We prove linear convergence when the objective is itself
max-of-smooth, and experiments suggest a more general phenomenon.
Related papers
- Long-time dynamics and universality of nonconvex gradient descent [0.7614628596146601]
This paper develops a general approach to characterize the long-time behavior of non gradient descent in single-index models.<n>Our approach reveals that gradient descents are in general approximately independent of the data and strongly incoherent with the feature vectors.
arXiv Detail & Related papers (2025-09-14T20:36:18Z) - Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness [50.78508362183774]
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance.<n>Most require the Lipschitz condition, which is often not met in common machine learning schemes.
arXiv Detail & Related papers (2025-07-11T15:36:48Z) - Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness [51.302674884611335]
This work introduces a hybrid non-Euclidean optimization method which generalizes norm clipping by combining steepest descent and conditional gradient approaches.<n>We discuss how to instantiate the algorithms for deep learning and demonstrate their properties on image classification and language modeling.
arXiv Detail & Related papers (2025-06-02T17:34:29Z) - Mirror Descent Under Generalized Smoothness [23.5387392871236]
We introduce a new $ell*$-smoothness concept that measures the norm of Hessian terms of a general norm and its dual.
We establish convergence for mirror-descent-type algorithms, matching the rates under the classic smoothness.
arXiv Detail & Related papers (2025-02-02T11:23:10Z) - Independently-Normalized SGD for Generalized-Smooth Nonconvex Optimization [19.000530691874516]
We show that many non machine learning problems meet that kind of condition that extends beyond traditional non-smoothepseps.
We propose an independently-normalized gradient descent algorithm, which leverages independent sampling and normalization.
arXiv Detail & Related papers (2024-10-17T21:52:00Z) - Directional Smoothness and Gradient Methods: Convergence and Adaptivity [16.779513676120096]
We develop new sub-optimality bounds for gradient descent that depend on the conditioning of the objective along the path of optimization.
Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective.
We prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness.
arXiv Detail & Related papers (2024-03-06T22:24:05Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - Robust Implicit Regularization via Weight Normalization [5.37610807422229]
We show that weight normalization enables a robust bias that persists even if the weights are at practically large scale.
Experiments suggest that the gains in both convergence speed and robustness of the implicit bias are improved dramatically by using weight normalization.
arXiv Detail & Related papers (2023-05-09T13:38:55Z) - Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions.
We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z) - The Convex Geometry of Backpropagation: Neural Network Gradient Flows
Converge to Extreme Points of the Dual Convex Program [26.143558180103334]
We study non- subgradient flows for two-layer ReLULU networks from a convex implicit geometry and duality perspective.
We show that we can identify the problem of non- subgradient descent via primal-dual correspondence.
arXiv Detail & Related papers (2021-10-13T04:17:08Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.