Related papers: Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport

Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport

URL: http://arxiv.org/abs/2205.14173v1
Date: Fri, 27 May 2022 18:01:45 GMT
Title: Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport
Authors: Lingkai Kong, Yuqing Wang, Molei Tao
Abstract summary: New approach is proposed based on, for the first time, an interplay between thoughtfully designed continuous and discrete dynamics. Method exactly preserves the manifold structure but does not require commonly used projection or retraction. Its generalization to adaptive learning rates is also demonstrated.
Score: 18.717832661972896
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The problem of optimization on Stiefel manifold, i.e., minimizing functions of (not necessarily square) matrices that satisfy orthogonality constraints, has been extensively studied, partly due to rich machine learning applications. Yet, a new approach is proposed based on, for the first time, an interplay between thoughtfully designed continuous and discrete dynamics. It leads to a gradient-based optimizer with intrinsically added momentum. This method exactly preserves the manifold structure but does not require commonly used projection or retraction, and thus having low computational costs when compared to existing algorithms. Its generalization to adaptive learning rates is also demonstrated. Pleasant performances are observed in various practical tasks. For instance, we discover that placing orthogonal constraints on attention heads of trained-from-scratch Vision Transformer [Dosovitskiy et al. 2022] could remarkably improve its performance, when our optimizer is used, and it is better that each head is made orthogonal within itself but not necessarily to other heads. This optimizer also makes the useful notion of Projection Robust Wasserstein Distance [Paty & Cuturi 2019][Lin et al. 2020] for high-dim. optimal transport even more effective.

Related papers

Efficient optimization of expensive black-box simulators via marginal means, with application to neutrino detector design [1.5749416770494706]
We propose a new Black-box Optimization via Marginal Means (BOMM) approach.<n>BOMM uses a new estimator of a global $mathbfx*$ that can be efficiently inferred with limited runs in high dimensions.<n>We show that BOMM is consistent for optimization, but also has an optimization rate that tempers the ''curse-of-dimensionality'' faced by existing methods.
arXiv Detail & Related papers (2025-08-03T16:44:05Z)
A Novel Unified Parametric Assumption for Nonconvex Optimization [53.943470475510196]
Non optimization is central to machine learning, but the general framework non convexity enables weak convergence guarantees too pessimistic compared to the other hand. We introduce a novel unified assumption in non convex algorithms.
arXiv Detail & Related papers (2025-02-17T21:25:31Z)
Understanding Optimization in Deep Learning with Central Flows [53.66160508990508]
We show that an RMS's implicit behavior can be explicitly captured by a "central flow:" a differential equation. We show that these flows can empirically predict long-term optimization trajectories of generic neural networks.
arXiv Detail & Related papers (2024-10-31T17:58:13Z)
Track Everything Everywhere Fast and Robustly [46.362962852140015]
We propose a novel test-time optimization approach for efficiently tracking any pixel in a video. We introduce a novel invertible deformation network, CaDeX++, which factorizes the function representation into a local spatial-temporal feature grid. Our experiments demonstrate a substantial improvement in training speed (more than textbf10 times faster), robustness, and accuracy in tracking over the SoTA optimization-based method OmniMotion.
arXiv Detail & Related papers (2024-03-26T17:58:22Z)
Operator SVD with Neural Networks via Nested Low-Rank Approximation [19.562492156734653]
This paper proposes a new optimization framework based on the low-rank approximation characterization of a truncated singular value decomposition. New techniques called emphnesting for learning the top-$L$ singular values and singular functions in the correct order. We demonstrate the effectiveness of the proposed framework for use cases in computational physics and machine learning.
arXiv Detail & Related papers (2024-02-06T03:06:06Z)
Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints [9.301728976515255]
This article provides new practical and theoretical developments for the landing algorithm. First, the method is extended to the Stiefel manifold. We also consider variance reduction algorithms when the cost function is an average of many functions.
arXiv Detail & Related papers (2023-03-29T07:36:54Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Learning to Optimize Quasi-Newton Methods [22.504971951262004]
This paper introduces a novel machine learning called LODO, which tries to online meta-learn the best preconditioner during optimization. Unlike other L2O methods, LODO does not require any meta-training on a training task distribution. We show that our gradient approximates the inverse Hessian in noisy loss landscapes and is capable of representing a wide range of inverse Hessians.
arXiv Detail & Related papers (2022-10-11T03:47:14Z)
Geometry-aware Bayesian Optimization in Robotics using Riemannian Mat\'ern Kernels [64.62221198500467]
We show how to implement geometry-aware kernels for Bayesian optimization. This technique can be used for control parameter tuning, parametric policy adaptation, and structure design in robotics.
arXiv Detail & Related papers (2021-11-02T09:47:22Z)
Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems. Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary. Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)
SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models [15.541264326378366]
In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks. The training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix. We propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer.
arXiv Detail & Related papers (2021-06-01T15:07:34Z)
Efficient Optimal Transport Algorithm by Accelerated Gradient descent [20.614477547939845]
We propose a novel algorithm to further improve the efficiency and accuracy based on Nesterov's smoothing technique. The proposed method achieves faster convergence and better accuracy with the same parameter.
arXiv Detail & Related papers (2021-04-12T20:23:29Z)
A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning [95.85269649177336]
ZO optimization iteratively performs three major steps: gradient estimation, descent direction, and solution update. We demonstrate promising applications of ZO optimization, such as evaluating and generating explanations from black-box deep learning models, and efficient online sensor management.
arXiv Detail & Related papers (2020-06-11T06:50:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.