Understanding Natural Gradient in Sobolev Spaces
- URL: http://arxiv.org/abs/2202.06232v1
- Date: Sun, 13 Feb 2022 07:04:44 GMT
- Title: Understanding Natural Gradient in Sobolev Spaces
- Authors: Qinxun Bai, Steven Rosenberg, Wei Xu
- Abstract summary: We study the natural gradient induced by Sobolevmetrics and develop several rigorous results.
Preliminary experimental results reveal the potential of this new natural gradient variant.
- Score: 15.33151811602988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While natural gradients have been widely studied from both theoretical and
empirical perspectives, we argue that a fundamental theoretical issue regarding
the existence of gradients in infinite dimensional function spaces remains
underexplored. We therefore study the natural gradient induced by
Sobolevmetrics and develop several rigorous results. Our results also establish
new connections between natural gradients and RKHS theory, and specifically to
the Neural Tangent Kernel (NTK). We develop computational techniques for the
efficient approximation of the proposed Sobolev Natural Gradient. Preliminary
experimental results reveal the potential of this new natural gradient variant.
Related papers
- Towards Training Without Depth Limits: Batch Normalization Without
Gradient Explosion [83.90492831583997]
We show that a batch-normalized network can keep the optimal signal propagation properties, but avoid exploding gradients in depth.
We use a Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded depth.
We also design an activation shaping scheme that empirically achieves the same properties for certain non-linear activations.
arXiv Detail & Related papers (2023-10-03T12:35:02Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Achieving High Accuracy with PINNs via Energy Natural Gradients [0.0]
We show that the update direction in function space resulting from the energy natural gradient corresponds to the Newton direction modulo an projection onto the model's tangent space.
We demonstrate experimentally that energy natural gradient descent yields highly accurate solutions with errors several orders of magnitude smaller than what is obtained when training PINNs with standard gradient descent or Adam.
arXiv Detail & Related papers (2023-02-25T21:17:19Z) - On the Overlooked Structure of Stochastic Gradients [34.650998241703626]
We show that dimension-wise gradients usually exhibit power-law heavy tails, while iteration-wise gradients and gradient noise caused by minibatch training usually do not exhibit power-law heavy tails.
Our work challenges the existing belief and provides novel insights on the structure of gradients in deep learning.
arXiv Detail & Related papers (2022-12-05T07:55:22Z) - Sampling in Constrained Domains with Orthogonal-Space Variational
Gradient Descent [13.724361914659438]
We propose a new variational framework with a designed orthogonal-space gradient flow (O-Gradient) for sampling on a manifold.
We prove that O-Gradient converges to the target constrained distribution with rate $widetildeO (1/textthe number of iterations)$ under mild conditions.
arXiv Detail & Related papers (2022-10-12T17:51:13Z) - Efficient Natural Gradient Descent Methods for Large-Scale Optimization
Problems [1.2891210250935146]
We propose an efficient method for computing natural gradient descent directions with respect to a generic metric in the state space.
Our technique relies on representing the natural gradient direction as a solution to a standard least-squares problem.
We can reliably compute several natural gradient descents, including the Wasserstein natural gradient parameter, for a large-scale space.
arXiv Detail & Related papers (2022-02-13T07:32:17Z) - Depth Without the Magic: Inductive Bias of Natural Gradient Descent [1.020554144865699]
In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories.
We characterize the behaviour of natural gradient flow in deep linear networks for separable classification under logistic loss and deep matrix factorization.
We demonstrate that there exist learning problems where natural gradient descent fails to generalize, while gradient descent with the right architecture performs well.
arXiv Detail & Related papers (2021-11-22T21:20:10Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs [71.26657499537366]
We propose a simple literature-based method for the efficient approximation of gradients in neural ODE models.
We compare it with the reverse dynamic method to train neural ODEs on classification, density estimation, and inference approximation tasks.
arXiv Detail & Related papers (2020-03-11T13:15:57Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.