Simplifying Momentum-based Positive-definite Submanifold Optimization
with Applications to Deep Learning
- URL: http://arxiv.org/abs/2302.09738v9
- Date: Sat, 16 Dec 2023 00:32:20 GMT
- Title: Simplifying Momentum-based Positive-definite Submanifold Optimization
with Applications to Deep Learning
- Authors: Wu Lin, Valentin Duruisseaux, Melvin Leok, Frank Nielsen, Mohammad
Emtiyaz Khan, Mark Schmidt
- Abstract summary: We show how to solve difficult differential equations with momentum on a submanifold.
We do so by proposing a generalized version of the Riemannian normal coordinates.
We use our approach to simplify existing approaches for structured covariances and develop matrix-inverse-free $2textnd$orders for deep learning with low precision by using only matrix multiplications.
- Score: 24.97120654216651
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Riemannian submanifold optimization with momentum is computationally
challenging because, to ensure that the iterates remain on the submanifold, we
often need to solve difficult differential equations. Here, we simplify such
difficulties for a class of sparse or structured symmetric positive-definite
matrices with the affine-invariant metric. We do so by proposing a generalized
version of the Riemannian normal coordinates that dynamically orthonormalizes
the metric and locally converts the problem into an unconstrained problem in
the Euclidean space. We use our approach to simplify existing approaches for
structured covariances and develop matrix-inverse-free $2^\text{nd}$-order
optimizers for deep learning with low precision by using only matrix
multiplications. Code: https://github.com/yorkerlin/StructuredNGD-DL
Related papers
- Structured Regularization for Constrained Optimization on the SPD Manifold [1.1126342180866644]
We introduce a class of structured regularizers, based on symmetric gauge functions, which allow for solving constrained optimization on the SPD manifold with faster unconstrained methods.
We show that our structured regularizers can be chosen to preserve or induce desirable structure, in particular convexity and "difference of convex" structure.
arXiv Detail & Related papers (2024-10-12T22:11:22Z) - Low-complexity subspace-descent over symmetric positive definite
manifold [9.346050098365648]
We develop low-complexity algorithms for the minimization of functions over the symmetric positive definite (SPD) manifold.
The proposed approach utilizes carefully chosen subspaces that allow the update to be written as a product of the Cholesky factor of the iterate and a sparse matrix.
arXiv Detail & Related papers (2023-05-03T11:11:46Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - Splitting numerical integration for matrix completion [0.0]
We propose a new algorithm for low rank matrix approximation.
The algorithm is an adaptation of classical gradient descent within the framework of optimization.
Experimental results show that our approach has good scalability for large-scale problems.
arXiv Detail & Related papers (2022-02-14T04:45:20Z) - Learning a Compressive Sensing Matrix with Structural Constraints via
Maximum Mean Discrepancy Optimization [17.104994036477308]
We introduce a learning-based algorithm to obtain a measurement matrix for compressive sensing related recovery problems.
Recent success of such metrics in neural network related topics motivate a solution of the problem based on machine learning.
arXiv Detail & Related papers (2021-10-14T08:35:54Z) - Nonlinear matrix recovery using optimization on the Grassmann manifold [18.655422834567577]
We investigate the problem of recovering a partially observed high-rank clustering matrix whose columns obey a nonlinear structure such as a union of subspaces.
We show that the alternating limit converges to a unique point using the Kurdyka-Lojasi property.
arXiv Detail & Related papers (2021-09-13T16:13:13Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Automatic differentiation for Riemannian optimization on low-rank matrix
and tensor-train manifolds [71.94111815357064]
In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions.
One of the popular tools for finding the low-rank approximations is to use the Riemannian optimization.
arXiv Detail & Related papers (2021-03-27T19:56:00Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.