Related papers: Invariance Properties of the Natural Gradient in Overparametrised Systems

Invariance Properties of the Natural Gradient in Overparametrised Systems

URL: http://arxiv.org/abs/2206.15273v1
Date: Thu, 30 Jun 2022 13:23:14 GMT
Title: Invariance Properties of the Natural Gradient in Overparametrised Systems
Authors: Jesse van Oostrum, Johannes M\"uller, Nihat Ay
Abstract summary: The natural gradient field represents the direction of steepest ascent of an objective function on a model equipped with a distinguished metric. We study when the pushforward of the natural parameter gradient is equal to the natural gradient.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The natural gradient field is a vector field that lives on a model equipped with a distinguished Riemannian metric, e.g. the Fisher-Rao metric, and represents the direction of steepest ascent of an objective function on the model with respect to this metric. In practice, one tries to obtain the corresponding direction on the parameter space by multiplying the ordinary gradient by the inverse of the Gram matrix associated with the metric. We refer to this vector on the parameter space as the natural parameter gradient. In this paper we study when the pushforward of the natural parameter gradient is equal to the natural gradient. Furthermore we investigate the invariance properties of the natural parameter gradient. Both questions are addressed in an overparametrised setting.

Related papers

Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning [1.4050802766699084]
We consider the scenario of supervised learning in Deep Learning (DL) networks. We choose the gradient flow with respect to the Euclidean metric in the output layer of the DL network.
arXiv Detail & Related papers (2023-11-27T02:12:02Z)
Intrinsic Bayesian Cramér-Rao Bound with an Application to Covariance Matrix Estimation [49.67011673289242]
This paper presents a new performance bound for estimation problems where the parameter to estimate lies in a smooth manifold. It induces a geometry for the parameter manifold, as well as an intrinsic notion of the estimation error measure.
arXiv Detail & Related papers (2023-11-08T15:17:13Z)
Neural Gradient Learning and Optimization for Oriented Point Normal Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation. We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors. Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z)
Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood. These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z)
Implicit Balancing and Regularization: Generalization and Convergence Guarantees for Overparameterized Asymmetric Matrix Sensing [28.77440901439686]
A series of recent papers have begun to generalize this role for non-random Positive Semi-Defin (PSD) matrix sensing problems. In this paper, we show that the trajectory of the gradient descent from small random measurements moves towards solutions that are both globally well.
arXiv Detail & Related papers (2023-03-24T19:05:52Z)
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent [64.26008239544085]
gradient flow with any commuting parametrization is equivalent to continuous mirror descent with a related Legendre function. continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization.
arXiv Detail & Related papers (2022-07-08T17:47:11Z)
Efficient Natural Gradient Descent Methods for Large-Scale Optimization Problems [1.2891210250935146]
We propose an efficient method for computing natural gradient descent directions with respect to a generic metric in the state space. Our technique relies on representing the natural gradient direction as a solution to a standard least-squares problem. We can reliably compute several natural gradient descents, including the Wasserstein natural gradient parameter, for a large-scale space.
arXiv Detail & Related papers (2022-02-13T07:32:17Z)
Generalized Tangent Kernel: A Unified Geometric Foundation for Natural Gradient and Standard Gradient [9.932574972162845]
We provide a geometric perspective and mathematical framework for studying both natural gradient and standard gradient.<n>The key tool that unifies natural gradient and standard gradient is a generalized form of the Neural Tangent Kernel (NTK)
arXiv Detail & Related papers (2022-02-13T07:04:44Z)
Depth Without the Magic: Inductive Bias of Natural Gradient Descent [1.020554144865699]
In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories. We characterize the behaviour of natural gradient flow in deep linear networks for separable classification under logistic loss and deep matrix factorization. We demonstrate that there exist learning problems where natural gradient descent fails to generalize, while gradient descent with the right architecture performs well.
arXiv Detail & Related papers (2021-11-22T21:20:10Z)
Natural Gradient Optimization for Optical Quantum Circuits [4.645254587634926]
We implement Natural Gradient descent in the optical quantum circuit setting. In particular, we adapt the Natural Gradient approach to a complex-valued parameter space. We observe that the NG approach has a faster convergence.
arXiv Detail & Related papers (2021-06-25T14:25:52Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees [106.91654068632882]
We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution. We show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate. Our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.
arXiv Detail & Related papers (2020-03-02T16:40:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.