Related papers: Tractable structured natural gradient descent using local parameterizations

Tractable structured natural gradient descent using local parameterizations

URL: http://arxiv.org/abs/2102.07405v1
Date: Mon, 15 Feb 2021 09:09:20 GMT
Title: Tractable structured natural gradient descent using local parameterizations
Authors: Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt
Abstract summary: Natural-gradient descent on structured parameter spaces is computationally challenging due to complicated inverse Fisher-matrix computations. We address this issue by using emphlocal- parameter coordinates. We show results on a range of applications on deep learning, variational inference, and evolution strategies.
Score: 43.51581051770027
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Natural-gradient descent on structured parameter spaces (e.g., low-rank covariances) is computationally challenging due to complicated inverse Fisher-matrix computations. We address this issue for optimization, inference, and search problems by using \emph{local-parameter coordinates}. Our method generalizes an existing evolutionary-strategy method, recovers Newton and Riemannian-gradient methods as special cases, and also yields new tractable natural-gradient algorithms for learning flexible covariance structures of Gaussian and Wishart-based distributions. We show results on a range of applications on deep learning, variational inference, and evolution strategies. Our work opens a new direction for scalable structured geometric methods via local parameterizations.

Related papers

Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations [50.010924231754856]
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence. To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus. We propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties.
arXiv Detail & Related papers (2025-04-01T14:36:45Z)
Relative Representations: Topological and Geometric Perspectives [53.88896255693922]
Relative representations are an established approach to zero-shot model stitching. We introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations. Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes.
arXiv Detail & Related papers (2024-09-17T08:09:22Z)
Information Geometry and Beta Link for Optimizing Sparse Variational Student-t Processes [6.37512592611305]
Student-t Processes has been proposed to enhance computational efficiency and flexibility for real-world datasets using gradient descent. Traditional gradient descent methods like Adam may not fully exploit the parameter space geometry, potentially leading to slower convergence and suboptimal performance. We adopt natural gradient methods from information geometry for variational parameter optimization of Student-t Processes.
arXiv Detail & Related papers (2024-08-13T07:53:39Z)
Mixed Newton Method for Optimization in Complex Spaces [32.73124984242397]
We show that arbitrary regularizations preserve the favorable local convergence properties of the Mixed Newton Method. We compare several variants of the method applied to training neural networks with real and complex parameters.
arXiv Detail & Related papers (2024-07-29T18:31:42Z)
Unnatural Algorithms in Machine Learning [0.0]
We show that optimization algorithms with this property can be viewed as discrete approximations of natural gradient descent. We introduce a simple method of introducing this naturality more generally and examine a number of popular machine learning training algorithms.
arXiv Detail & Related papers (2023-12-07T22:43:37Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
The Dynamics of Riemannian Robbins-Monro Algorithms [101.29301565229265]
We propose a family of Riemannian algorithms generalizing and extending the seminal approximation framework of Robbins and Monro. Compared to their Euclidean counterparts, Riemannian algorithms are much less understood due to lack of a global linear structure on the manifold. We provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes.
arXiv Detail & Related papers (2022-06-14T12:30:11Z)
Structured second-order methods via natural gradient descent [43.51581051770027]
We propose new structured second-order methods and structured adaptive-gradient methods. Natural-gradient descent is an attractive approach to design algorithms in many settings.
arXiv Detail & Related papers (2021-07-22T19:03:53Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods [7.486132958737807]
We present an adaptive variance reduced method with an implicit approach for adaptivity. We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits. This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of functions.
arXiv Detail & Related papers (2021-02-19T01:17:15Z)
Stochastic Flows and Geometric Optimization on the Orthogonal Group [52.50121190744979]
We present a new class of geometrically-driven optimization algorithms on the orthogonal group $O(d)$. We show that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, flows and metric learning.
arXiv Detail & Related papers (2020-03-30T15:37:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.