Related papers: Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization

Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization

URL: http://arxiv.org/abs/2006.08426v4
Date: Wed, 30 Aug 2023 17:19:36 GMT
Title: Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization
Authors: Hassan Mortagy, Swati Gupta, Sebastian Pokutta
Abstract summary: We show that the continuous-time dynamics of moving in the shadow is equivalent to the dynamics of projected gradient descent (PGD) We combine these insights into a novel Shadow-CG method that uses FW and shadow steps, while enjoying linear convergence. We provide a linear bound on the number of breakpoints for simple polytopes and present scaling-invariant upper bounds for general polytopes.
Score: 29.861939940760898
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Descent directions such as movement towards Descent directions, including movement towards Frank-Wolfe vertices, away-steps, in-face away-steps and pairwise directions, have been an important design consideration in conditional gradient descent (CGD) variants. In this work, we attempt to demystify the impact of the movement in these directions towards attaining constrained minimizers. The optimal local direction of descent is the directional derivative (i.e., shadow) of the projection of the negative gradient. We show that this direction is the best away-step possible, and the continuous-time dynamics of moving in the shadow is equivalent to the dynamics of projected gradient descent (PGD), although it's non-trivial to discretize. We also show that Frank-Wolfe (FW) vertices correspond to projecting onto the polytope using an "infinite" step in the direction of the negative gradient, thus providing a new perspective on these steps. We combine these insights into a novel Shadow-CG method that uses FW and shadow steps, while enjoying linear convergence, with a rate that depends on the number of breakpoints in its projection curve, rather than the pyramidal width. We provide a linear bound on the number of breakpoints for simple polytopes and present scaling-invariant upper bounds for general polytopes based on the number of facets. We exemplify the benefit of using Shadow-CG computationally for various applications, while raising an open question about tightening the bound on the number of breakpoints for general polytopes.

Related papers

Level Set Teleportation: An Optimization Perspective [21.84775414778289]
We study level set teleportation, an optimization sub-routine which seeks to accelerate gradient methods. For convex functions satisfying Hessian stability, we prove that GD with level-set teleportation obtains a combined sub-linear/linear convergence rate which is strictly faster than standard GD. This is in sharp contrast to the standard (strongly) convex setting, where we show level-set teleportation neither improves nor worsens convergence rates.
arXiv Detail & Related papers (2024-03-05T23:16:13Z)
Corridor Geometry in Gradient-Based Optimization [11.177186975058047]
We show that corridors are exactly the regions where gradient descent and the gradient flow follow the same trajectory. Using the loss linear decrease on corridors, we devise a learning rate adaptation scheme for gradient descent.
arXiv Detail & Related papers (2024-02-13T21:54:15Z)
How to guess a gradient [68.98681202222664]
We show that gradients are more structured than previously thought. Exploiting this structure can significantly improve gradient-free optimization schemes. We highlight new challenges in overcoming the large gap between optimizing with exact gradients and guessing the gradients.
arXiv Detail & Related papers (2023-12-07T21:40:44Z)
Neural Gradient Learning and Optimization for Oriented Point Normal Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation. We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors. Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z)
Parameter-free projected gradient descent [0.0]
We consider the problem of minimizing a convex function over a closed convex set, with Projected Gradient Descent (PGD) We propose a fully parameter-free version of AdaGrad, which is adaptive to the distance between the initialization and the optimum, and to the sum of the square norm of the subgradients. Our algorithm is able to handle projection steps, does not involve restarts, reweighing along the trajectory or additional evaluations compared to the classical PGD.
arXiv Detail & Related papers (2023-05-31T07:22:44Z)
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima [41.961056785108845]
We consider Sharpness-Aware Minimization, a gradient-based optimization method for deep networks. We show that when SAM is applied with a convex quadratic objective, it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature. In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian.
arXiv Detail & Related papers (2022-10-04T10:34:37Z)
Shortest Paths in Graphs with Matrix-Valued Edges: Concepts, Algorithm and Application to 3D Multi-Shape Analysis [69.08838724594584]
Finding shortest paths in a graph is relevant for numerous problems in computer vision and graphics. We introduce the novel graph-theoretic concept of a shortest path in a graph with matrix-valued edges.
arXiv Detail & Related papers (2021-12-08T08:23:37Z)
On Training Implicit Models [75.20173180996501]
We propose a novel gradient estimate for implicit models, named phantom gradient, that forgoes the costly computation of the exact gradient. Experiments on large-scale tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models by roughly 1.7 times.
arXiv Detail & Related papers (2021-11-09T14:40:24Z)
Deep Point Set Resampling via Gradient Fields [11.5128379063303]
3D point clouds acquired by scanning real-world objects or scenes have found a wide range of applications. They are often perturbed by noise or suffer from low density, which obstructs downstream tasks such as surface reconstruction and understanding. We propose a novel paradigm of point set resampling for restoration, which learns continuous gradient fields of point clouds.
arXiv Detail & Related papers (2021-11-03T07:20:35Z)
Projective Manifold Gradient Layer for Deep Rotation Regression [49.85464297105456]
Regressing rotations on SO(3) manifold using deep neural networks is an important yet unsolved problem. We propose a manifold-aware gradient that directly backpropagates into deep network weights.
arXiv Detail & Related papers (2021-10-22T08:34:15Z)
Accelerating Smooth Games by Manipulating Spectral Shapes [51.366219027745174]
We use matrix iteration theory to characterize acceleration in smooth games. We describe gradient-based methods, such as extragradient, as transformations on the spectral shape. We propose an optimal algorithm for bilinear games.
arXiv Detail & Related papers (2020-01-02T19:21:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.