Walking in the Shadow: A New Perspective on Descent Directions for
Constrained Minimization
- URL: http://arxiv.org/abs/2006.08426v4
- Date: Wed, 30 Aug 2023 17:19:36 GMT
- Title: Walking in the Shadow: A New Perspective on Descent Directions for
Constrained Minimization
- Authors: Hassan Mortagy, Swati Gupta, Sebastian Pokutta
- Abstract summary: We show that the continuous-time dynamics of moving in the shadow is equivalent to the dynamics of projected gradient descent (PGD)
We combine these insights into a novel Shadow-CG method that uses FW and shadow steps, while enjoying linear convergence.
We provide a linear bound on the number of breakpoints for simple polytopes and present scaling-invariant upper bounds for general polytopes.
- Score: 29.861939940760898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Descent directions such as movement towards Descent directions, including
movement towards Frank-Wolfe vertices, away-steps, in-face away-steps and
pairwise directions, have been an important design consideration in conditional
gradient descent (CGD) variants. In this work, we attempt to demystify the
impact of the movement in these directions towards attaining constrained
minimizers. The optimal local direction of descent is the directional
derivative (i.e., shadow) of the projection of the negative gradient. We show
that this direction is the best away-step possible, and the continuous-time
dynamics of moving in the shadow is equivalent to the dynamics of projected
gradient descent (PGD), although it's non-trivial to discretize. We also show
that Frank-Wolfe (FW) vertices correspond to projecting onto the polytope using
an "infinite" step in the direction of the negative gradient, thus providing a
new perspective on these steps. We combine these insights into a novel
Shadow-CG method that uses FW and shadow steps, while enjoying linear
convergence, with a rate that depends on the number of breakpoints in its
projection curve, rather than the pyramidal width. We provide a linear bound on
the number of breakpoints for simple polytopes and present scaling-invariant
upper bounds for general polytopes based on the number of facets. We exemplify
the benefit of using Shadow-CG computationally for various applications, while
raising an open question about tightening the bound on the number of
breakpoints for general polytopes.
Related papers
- Level Set Teleportation: An Optimization Perspective [21.84775414778289]
We study level set teleportation, an optimization sub-routine which seeks to accelerate gradient methods.
For convex functions satisfying Hessian stability, we prove that GD with level-set teleportation obtains a combined sub-linear/linear convergence rate which is strictly faster than standard GD.
This is in sharp contrast to the standard (strongly) convex setting, where we show level-set teleportation neither improves nor worsens convergence rates.
arXiv Detail & Related papers (2024-03-05T23:16:13Z) - Corridor Geometry in Gradient-Based Optimization [11.177186975058047]
We show that corridors are exactly the regions where gradient descent and the gradient flow follow the same trajectory.
Using the loss linear decrease on corridors, we devise a learning rate adaptation scheme for gradient descent.
arXiv Detail & Related papers (2024-02-13T21:54:15Z) - How to guess a gradient [68.98681202222664]
We show that gradients are more structured than previously thought.
Exploiting this structure can significantly improve gradient-free optimization schemes.
We highlight new challenges in overcoming the large gap between optimizing with exact gradients and guessing the gradients.
arXiv Detail & Related papers (2023-12-07T21:40:44Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Parameter-free projected gradient descent [0.0]
We consider the problem of minimizing a convex function over a closed convex set, with Projected Gradient Descent (PGD)
We propose a fully parameter-free version of AdaGrad, which is adaptive to the distance between the initialization and the optimum, and to the sum of the square norm of the subgradients.
Our algorithm is able to handle projection steps, does not involve restarts, reweighing along the trajectory or additional evaluations compared to the classical PGD.
arXiv Detail & Related papers (2023-05-31T07:22:44Z) - The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines
and Drifting Towards Wide Minima [41.961056785108845]
We consider Sharpness-Aware Minimization, a gradient-based optimization method for deep networks.
We show that when SAM is applied with a convex quadratic objective, it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature.
In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian.
arXiv Detail & Related papers (2022-10-04T10:34:37Z) - Shortest Paths in Graphs with Matrix-Valued Edges: Concepts, Algorithm
and Application to 3D Multi-Shape Analysis [69.08838724594584]
Finding shortest paths in a graph is relevant for numerous problems in computer vision and graphics.
We introduce the novel graph-theoretic concept of a shortest path in a graph with matrix-valued edges.
arXiv Detail & Related papers (2021-12-08T08:23:37Z) - On Training Implicit Models [75.20173180996501]
We propose a novel gradient estimate for implicit models, named phantom gradient, that forgoes the costly computation of the exact gradient.
Experiments on large-scale tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models by roughly 1.7 times.
arXiv Detail & Related papers (2021-11-09T14:40:24Z) - Deep Point Set Resampling via Gradient Fields [11.5128379063303]
3D point clouds acquired by scanning real-world objects or scenes have found a wide range of applications.
They are often perturbed by noise or suffer from low density, which obstructs downstream tasks such as surface reconstruction and understanding.
We propose a novel paradigm of point set resampling for restoration, which learns continuous gradient fields of point clouds.
arXiv Detail & Related papers (2021-11-03T07:20:35Z) - Projective Manifold Gradient Layer for Deep Rotation Regression [49.85464297105456]
Regressing rotations on SO(3) manifold using deep neural networks is an important yet unsolved problem.
We propose a manifold-aware gradient that directly backpropagates into deep network weights.
arXiv Detail & Related papers (2021-10-22T08:34:15Z) - Accelerating Smooth Games by Manipulating Spectral Shapes [51.366219027745174]
We use matrix iteration theory to characterize acceleration in smooth games.
We describe gradient-based methods, such as extragradient, as transformations on the spectral shape.
We propose an optimal algorithm for bilinear games.
arXiv Detail & Related papers (2020-01-02T19:21:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.