ViViT: Curvature access through the generalized Gauss-Newton's low-rank
structure
- URL: http://arxiv.org/abs/2106.02624v1
- Date: Fri, 4 Jun 2021 17:37:47 GMT
- Title: ViViT: Curvature access through the generalized Gauss-Newton's low-rank
structure
- Authors: Felix Dangel, Lukas Tatzel, Philipp Hennig
- Abstract summary: Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks.
We present ViViT, a curvature model that leverages the GGN's low-rank structure without further approximations.
- Score: 26.24282086797512
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Curvature in form of the Hessian or its generalized Gauss-Newton (GGN)
approximation is valuable for algorithms that rely on a local model for the
loss to train, compress, or explain deep networks. Existing methods based on
implicit multiplication via automatic differentiation or Kronecker-factored
block diagonal approximations do not consider noise in the mini-batch. We
present ViViT, a curvature model that leverages the GGN's low-rank structure
without further approximations. It allows for efficient computation of
eigenvalues, eigenvectors, as well as per-sample first- and second-order
directional derivatives. The representation is computed in parallel with
gradients in one backward pass and offers a fine-grained cost-accuracy
trade-off, which allows it to scale. As examples for ViViT's usefulness, we
investigate the directional gradients and curvatures during training, and how
noise information can be used to improve the stability of second-order methods.
Related papers
- Epistemic Uncertainty and Observation Noise with the Neural Tangent Kernel [12.464924018243988]
Recent work has shown that training wide neural networks with gradient descent is formally equivalent to computing the mean of the posterior distribution in a Gaussian Process.
We show how to deal with non-zero aleatoric noise and derive an estimator for the posterior covariance.
arXiv Detail & Related papers (2024-09-06T00:34:44Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Machine learning algorithms for three-dimensional mean-curvature
computation in the level-set method [0.0]
We propose a data-driven mean-curvature solver for the level-set method.
Our proposed system can yield more accurate mean-curvature estimations than modern particle-based interface reconstruction.
arXiv Detail & Related papers (2022-08-18T20:19:22Z) - Error-Correcting Neural Networks for Two-Dimensional Curvature
Computation in the Level-Set Method [0.0]
We present an error-neural-modeling-based strategy for approximating two-dimensional curvature in the level-set method.
Our main contribution is a redesigned hybrid solver that relies on numerical schemes to enable machine-learning operations on demand.
arXiv Detail & Related papers (2022-01-22T05:14:40Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.