Level Set Teleportation: An Optimization Perspective
- URL: http://arxiv.org/abs/2403.03362v2
- Date: Tue, 18 Mar 2025 17:48:12 GMT
- Title: Level Set Teleportation: An Optimization Perspective
- Authors: Aaron Mishkin, Alberto Bietti, Robert M. Gower,
- Abstract summary: We study level set teleportation, an optimization routine which tries to accelerate gradient descent (GD)<n>While teleportation intuitively speeds-up GD via bigger steps, current objective lacks convergence theory for convex convergence.<n>This is in sharp contrast to the standard (strongly) setting, where teleportation neither improves nor convergence.
- Score: 21.84775414778289
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study level set teleportation, an optimization routine which tries to accelerate gradient descent (GD) by maximizing the gradient norm over a level set of the objective. While teleportation intuitively speeds-up GD via bigger steps, current work lacks convergence theory for convex functions, guarantees for solving the teleportation operator, and even clear empirical evidence showing this acceleration. We resolve these open questions. For convex functions satisfying Hessian stability, we prove that GD with teleportation obtains a combined sub-linear/linear convergence rate which is strictly faster than GD when the optimality gap is small. This is in sharp contrast to the standard (strongly) convex setting, where teleportation neither improves nor worsens convergence. To evaluate teleportation in practice, we develop a projected-gradient method requiring only Hessian-vector products. We use this to show that gradient methods with access to a teleportation oracle out-perform their standard versions on a variety of problems. We also find that GD with teleportation is faster than truncated Newton methods, particularly for non-convex optimization.
Related papers
- Teleportation With Null Space Gradient Projection for Optimization Acceleration [31.641252776379957]
We introduce an algorithm that projects the gradient of the teleportation objective function onto the input null space.
Our approach is readily generalizable from CNNs to transformers, and potentially other advanced architectures.
arXiv Detail & Related papers (2025-02-17T02:27:16Z) - Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Directional Smoothness and Gradient Methods: Convergence and Adaptivity [16.779513676120096]
We develop new sub-optimality bounds for gradient descent that depend on the conditioning of the objective along the path of optimization.
Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective.
We prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness.
arXiv Detail & Related papers (2024-03-06T22:24:05Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Vanishing Point Estimation in Uncalibrated Images with Prior Gravity
Direction [82.72686460985297]
We tackle the problem of estimating a Manhattan frame.
We derive two new 2-line solvers, one of which does not suffer from singularities affecting existing solvers.
We also design a new non-minimal method, running on an arbitrary number of lines, to boost the performance in local optimization.
arXiv Detail & Related papers (2023-08-21T13:03:25Z) - Parameter-free projected gradient descent [0.0]
We consider the problem of minimizing a convex function over a closed convex set, with Projected Gradient Descent (PGD)
We propose a fully parameter-free version of AdaGrad, which is adaptive to the distance between the initialization and the optimum, and to the sum of the square norm of the subgradients.
Our algorithm is able to handle projection steps, does not involve restarts, reweighing along the trajectory or additional evaluations compared to the classical PGD.
arXiv Detail & Related papers (2023-05-31T07:22:44Z) - Improving Convergence and Generalization Using Parameter Symmetries [34.863101622189944]
We show that teleporting to minima with different curvatures improves generalization, which suggests a connection between the curvature of the minimum and generalization ability.
Our results showcase the versatility of teleportation and demonstrate the potential of incorporating symmetry in optimization.
arXiv Detail & Related papers (2023-05-22T18:35:42Z) - Realistic continuous-variable quantum teleportation using a displaced
Fock state channel [3.42658286826597]
We investigate the ideal and non-ideal continuous-variable quantum teleportation protocols realized by using an entangled displaced Fock state resource.
It is found that for such single-mode input fields, the average fidelity remains at the classical threshold, suggesting that the displaced Fock states are not advantageous for teleportation.
arXiv Detail & Related papers (2023-04-11T09:30:49Z) - Minimum Latency Deep Online Video Stabilization [77.68990069996939]
We present a novel camera path optimization framework for the task of online video stabilization.
In this work, we adopt recent off-the-shelf high-quality deep motion models for motion estimation to recover the camera trajectory.
Our approach significantly outperforms state-of-the-art online methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-12-05T07:37:32Z) - Symmetry Teleportation for Accelerated Optimization [21.989906418276906]
We study a different approach, symmetry teleportation, that allows the parameters to travel a large distance on the loss level set.
We derive the loss-invariant group actions for test functions and multi-layer neural networks, and prove a necessary condition of when teleportation improves convergence rate.
Experimentally, we show that teleportation improves the convergence speed of gradient descent and AdaGrad for several optimization problems including test functions, multi-layer regressions, and MNIST classification.
arXiv Detail & Related papers (2022-05-21T16:39:21Z) - DiffSkill: Skill Abstraction from Differentiable Physics for Deformable
Object Manipulations with Tools [96.38972082580294]
DiffSkill is a novel framework that uses a differentiable physics simulator for skill abstraction to solve deformable object manipulation tasks.
In particular, we first obtain short-horizon skills using individual tools from a gradient-based simulator.
We then learn a neural skill abstractor from the demonstration trajectories which takes RGBD images as input.
arXiv Detail & Related papers (2022-03-31T17:59:38Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z) - Decreasing scaling transition from adaptive gradient descent to
stochastic gradient descent [1.7874193862154875]
We propose a decreasing scaling transition from adaptive gradient descent to gradient descent method DSTAda.
Our experimental results show that DSTAda has a faster speed, higher accuracy, and better stability and robustness.
arXiv Detail & Related papers (2021-06-12T11:28:58Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - Parameter-free Locally Accelerated Conditional Gradients [91.19349793915615]
We introduce a novel,.
Free Locally accelerated CG (PF-LaCG) algorithm, for which we provide rigorous convergence guarantees.
Our theoretical results are complemented by numerical experiments, which demonstrate local acceleration and showcase the practical improvements of PF-LaCG over non-accelerated algorithms.
arXiv Detail & Related papers (2021-02-12T22:50:01Z) - Optimal tests for continuous-variable quantum teleportation and
photodetectors [4.3748379918040845]
We prove that the optimal state for testing CV teleportation is an entangled superposition of twin Fock states.
These results are relevant for experiments that make use of CV teleportation and photodetectors.
arXiv Detail & Related papers (2020-12-04T18:18:09Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Walking in the Shadow: A New Perspective on Descent Directions for
Constrained Minimization [29.861939940760898]
We show that the continuous-time dynamics of moving in the shadow is equivalent to the dynamics of projected gradient descent (PGD)
We combine these insights into a novel Shadow-CG method that uses FW and shadow steps, while enjoying linear convergence.
We provide a linear bound on the number of breakpoints for simple polytopes and present scaling-invariant upper bounds for general polytopes.
arXiv Detail & Related papers (2020-06-15T14:26:56Z) - IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method [64.15649345392822]
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex.
Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method.
When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds.
arXiv Detail & Related papers (2020-06-11T18:49:06Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.