Understanding Natural Gradient in Sobolev Spaces
- URL: http://arxiv.org/abs/2202.06232v1
- Date: Sun, 13 Feb 2022 07:04:44 GMT
- Title: Understanding Natural Gradient in Sobolev Spaces
- Authors: Qinxun Bai, Steven Rosenberg, Wei Xu
- Abstract summary: We study the natural gradient induced by Sobolevmetrics and develop several rigorous results.
Preliminary experimental results reveal the potential of this new natural gradient variant.
- Score: 15.33151811602988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While natural gradients have been widely studied from both theoretical and
empirical perspectives, we argue that a fundamental theoretical issue regarding
the existence of gradients in infinite dimensional function spaces remains
underexplored. We therefore study the natural gradient induced by
Sobolevmetrics and develop several rigorous results. Our results also establish
new connections between natural gradients and RKHS theory, and specifically to
the Neural Tangent Kernel (NTK). We develop computational techniques for the
efficient approximation of the proposed Sobolev Natural Gradient. Preliminary
experimental results reveal the potential of this new natural gradient variant.
Related papers
- Optimization Guarantees for Square-Root Natural-Gradient Variational Inference [16.89312441692349]
This paper establishes convergence guarantees for variational-Gaussian inference and its continuous-time gradient flow.<n>Experiments demonstrate the effectiveness of natural gradient methods and highlight their advantages over algorithms that use Euclidean or Wasserstein geometries.
arXiv Detail & Related papers (2025-07-10T15:33:28Z) - Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness [51.302674884611335]
This work introduces a hybrid non-Euclidean optimization method which generalizes norm clipping by combining steepest descent and conditional gradient approaches.<n>We discuss how to instantiate the algorithms for deep learning and demonstrate their properties on image classification and language modeling.
arXiv Detail & Related papers (2025-06-02T17:34:29Z) - Mathematical analysis of the gradients in deep learning [3.3123773366516645]
We show that a gradient function must coincide with the standard gradient of the cost functional on every open sets on which the cost functional is continuously differentiable.<n>We conclude that the generalized gradient function must coincide with the standard gradient of the cost functional on every open sets on which the cost functional is continuously differentiable.
arXiv Detail & Related papers (2025-01-26T19:11:57Z) - A New Formulation of Lipschitz Constrained With Functional Gradient Learning for GANs [52.55025869932486]
This paper introduces a promising alternative method for training Generative Adversarial Networks (GANs) on large-scale datasets with clear theoretical guarantees.
We propose a novel Lipschitz-constrained Functional Gradient GANs learning (Li-CFG) method to stabilize the training of GAN.
We demonstrate that the neighborhood size of the latent vector can be reduced by increasing the norm of the discriminator gradient.
arXiv Detail & Related papers (2025-01-20T02:48:07Z) - Towards Training Without Depth Limits: Batch Normalization Without
Gradient Explosion [83.90492831583997]
We show that a batch-normalized network can keep the optimal signal propagation properties, but avoid exploding gradients in depth.
We use a Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded depth.
We also design an activation shaping scheme that empirically achieves the same properties for certain non-linear activations.
arXiv Detail & Related papers (2023-10-03T12:35:02Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Achieving High Accuracy with PINNs via Energy Natural Gradients [0.0]
We show that the update direction in function space resulting from the energy natural gradient corresponds to the Newton direction modulo an projection onto the model's tangent space.
We demonstrate experimentally that energy natural gradient descent yields highly accurate solutions with errors several orders of magnitude smaller than what is obtained when training PINNs with standard gradient descent or Adam.
arXiv Detail & Related papers (2023-02-25T21:17:19Z) - On the Overlooked Structure of Stochastic Gradients [34.650998241703626]
We show that dimension-wise gradients usually exhibit power-law heavy tails, while iteration-wise gradients and gradient noise caused by minibatch training usually do not exhibit power-law heavy tails.
Our work challenges the existing belief and provides novel insights on the structure of gradients in deep learning.
arXiv Detail & Related papers (2022-12-05T07:55:22Z) - Sampling in Constrained Domains with Orthogonal-Space Variational
Gradient Descent [13.724361914659438]
We propose a new variational framework with a designed orthogonal-space gradient flow (O-Gradient) for sampling on a manifold.
We prove that O-Gradient converges to the target constrained distribution with rate $widetildeO (1/textthe number of iterations)$ under mild conditions.
arXiv Detail & Related papers (2022-10-12T17:51:13Z) - Invariance Properties of the Natural Gradient in Overparametrised
Systems [0.0]
The natural gradient field represents the direction of steepest ascent of an objective function on a model equipped with a distinguished metric.
We study when the pushforward of the natural parameter gradient is equal to the natural gradient.
arXiv Detail & Related papers (2022-06-30T13:23:14Z) - Efficient Natural Gradient Descent Methods for Large-Scale Optimization
Problems [1.2891210250935146]
We propose an efficient method for computing natural gradient descent directions with respect to a generic metric in the state space.
Our technique relies on representing the natural gradient direction as a solution to a standard least-squares problem.
We can reliably compute several natural gradient descents, including the Wasserstein natural gradient parameter, for a large-scale space.
arXiv Detail & Related papers (2022-02-13T07:32:17Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Depth Without the Magic: Inductive Bias of Natural Gradient Descent [1.020554144865699]
In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories.
We characterize the behaviour of natural gradient flow in deep linear networks for separable classification under logistic loss and deep matrix factorization.
We demonstrate that there exist learning problems where natural gradient descent fails to generalize, while gradient descent with the right architecture performs well.
arXiv Detail & Related papers (2021-11-22T21:20:10Z) - Natural Gradient Optimization for Optical Quantum Circuits [4.645254587634926]
We implement Natural Gradient descent in the optical quantum circuit setting.
In particular, we adapt the Natural Gradient approach to a complex-valued parameter space.
We observe that the NG approach has a faster convergence.
arXiv Detail & Related papers (2021-06-25T14:25:52Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Sinkhorn Natural Gradient for Generative Models [125.89871274202439]
We propose a novel Sinkhorn Natural Gradient (SiNG) algorithm which acts as a steepest descent method on the probability space endowed with the Sinkhorn divergence.
We show that the Sinkhorn information matrix (SIM), a key component of SiNG, has an explicit expression and can be evaluated accurately in complexity that scales logarithmically.
In our experiments, we quantitatively compare SiNG with state-of-the-art SGD-type solvers on generative tasks to demonstrate its efficiency and efficacy of our method.
arXiv Detail & Related papers (2020-11-09T02:51:17Z) - Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs [71.26657499537366]
We propose a simple literature-based method for the efficient approximation of gradients in neural ODE models.
We compare it with the reverse dynamic method to train neural ODEs on classification, density estimation, and inference approximation tasks.
arXiv Detail & Related papers (2020-03-11T13:15:57Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.