FORML: A Riemannian Hessian-free Method for Meta-learning on Stiefel Manifolds
- URL: http://arxiv.org/abs/2402.18605v2
- Date: Fri, 31 May 2024 21:34:33 GMT
- Title: FORML: A Riemannian Hessian-free Method for Meta-learning on Stiefel Manifolds
- Authors: Hadi Tabealhojeh, Soumava Kumar Roy, Peyman Adibi, Hossein Karshenas,
- Abstract summary: This paper introduces a Hessian-free approach that uses a first-order approximation of derivatives on the Stiefel manifold.
Our method significantly reduces the computational load and memory footprint.
- Score: 4.757859522106933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Meta-learning problem is usually formulated as a bi-level optimization in which the task-specific and the meta-parameters are updated in the inner and outer loops of optimization, respectively. However, performing the optimization in the Riemannian space, where the parameters and meta-parameters are located on Riemannian manifolds is computationally intensive. Unlike the Euclidean methods, the Riemannian backpropagation needs computing the second-order derivatives that include backward computations through the Riemannian operators such as retraction and orthogonal projection. This paper introduces a Hessian-free approach that uses a first-order approximation of derivatives on the Stiefel manifold. Our method significantly reduces the computational load and memory footprint. We show how using a Stiefel fully-connected layer that enforces orthogonality constraint on the parameters of the last classification layer as the head of the backbone network, strengthens the representation reuse of the gradient-based meta-learning methods. Our experimental results across various few-shot learning datasets, demonstrate the superiority of our proposed method compared to the state-of-the-art methods, especially MAML, its Euclidean counterpart.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Riemannian Bilevel Optimization [35.42472057648458]
We focus in particular on batch and gradient-based methods, with the explicit goal of avoiding second-order information.
We propose and analyze $mathrmRF2SA$, a method that leverages first-order gradient information.
We provide explicit convergence rates for reaching $epsilon$-stationary points under various setups.
arXiv Detail & Related papers (2024-05-22T20:49:01Z) - Streamlining in the Riemannian Realm: Efficient Riemannian Optimization
with Loopless Variance Reduction [4.578425862931332]
This study focuses on the crucial reduction mechanism used in both Euclidean and Riemannian settings.
Motivated by Euclidean methods, we introduce R-based methods to replace the outer loop with computation triggered by a coin flip.
Using R- as a framework, we demonstrate its applicability to various important settings.
arXiv Detail & Related papers (2024-03-11T12:49:37Z) - Decentralized Riemannian Conjugate Gradient Method on the Stiefel
Manifold [59.73080197971106]
This paper presents a first-order conjugate optimization method that converges faster than the steepest descent method.
It aims to achieve global convergence over the Stiefel manifold.
arXiv Detail & Related papers (2023-08-21T08:02:16Z) - Decentralized Riemannian natural gradient methods with Kronecker-product
approximations [11.263837420265594]
We present an efficient decentralized natural gradient descent (DRNGD) method for solving decentralized manifold optimization problems.
By performing the communications over the Kronecker factors, a high-quality approximation of the RFIM can be obtained in a low cost.
arXiv Detail & Related papers (2023-03-16T19:36:31Z) - Gaussian Processes and Statistical Decision-making in Non-Euclidean
Spaces [96.53463532832939]
We develop techniques for broadening the applicability of Gaussian processes.
We introduce a wide class of efficient approximations built from this viewpoint.
We develop a collection of Gaussian process models over non-Euclidean spaces.
arXiv Detail & Related papers (2022-02-22T01:42:57Z) - Automatic differentiation for Riemannian optimization on low-rank matrix
and tensor-train manifolds [71.94111815357064]
In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions.
One of the popular tools for finding the low-rank approximations is to use the Riemannian optimization.
arXiv Detail & Related papers (2021-03-27T19:56:00Z) - Bayesian Quadrature on Riemannian Data Manifolds [79.71142807798284]
A principled way to model nonlinear geometric structure inherent in data is provided.
However, these operations are typically computationally demanding.
In particular, we focus on Bayesian quadrature (BQ) to numerically compute integrals over normal laws.
We show that by leveraging both prior knowledge and an active exploration scheme, BQ significantly reduces the number of required evaluations.
arXiv Detail & Related papers (2021-02-12T17:38:04Z) - Mat\'ern Gaussian processes on Riemannian manifolds [81.15349473870816]
We show how to generalize the widely-used Mat'ern class of Gaussian processes.
We also extend the generalization from the Mat'ern to the widely-used squared exponential process.
arXiv Detail & Related papers (2020-06-17T21:05:42Z) - Riemannian Stochastic Proximal Gradient Methods for Nonsmooth
Optimization over the Stiefel Manifold [7.257751371276488]
R-ProxSGD and R-ProxSPB are generalizations of proximal SGD and proximal SpiderBoost.
R-ProxSPB algorithm finds an $epsilon$-stationary point with $O(epsilon-3)$ IFOs in the online case, and $O(n+sqrtnepsilon-3)$ IFOs in the finite-sum case.
arXiv Detail & Related papers (2020-05-03T23:41:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.