On the validity of kernel approximations for orthogonally-initialized
neural networks
- URL: http://arxiv.org/abs/2104.05878v1
- Date: Tue, 13 Apr 2021 00:57:39 GMT
- Title: On the validity of kernel approximations for orthogonally-initialized
neural networks
- Authors: James Martens
- Abstract summary: We extend kernel function approximation results for neural networks with Gaussian-distributed weights to single-layer networks using Haar-distributed random matrices (with possible rescaling).
- Score: 14.23089477635398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this note we extend kernel function approximation results for neural
networks with Gaussian-distributed weights to single-layer networks initialized
using Haar-distributed random orthogonal matrices (with possible rescaling).
This is accomplished using recent results from random matrix theory.
Related papers
- Genus expansion for non-linear random matrix ensembles with applications to neural networks [3.801509221714223]
We present a unified approach to studying certain non-linear random matrix ensembles and associated random neural networks.<n>We use a novel series expansion for neural networks which generalizes Fa'a di Bruno's formula to an arbitrary number of compositions.<n>As an application, we prove several results about neural networks with random weights.
arXiv Detail & Related papers (2024-07-11T12:58:07Z) - Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - An Exact Kernel Equivalence for Finite Classification Models [1.4777718769290527]
We compare our exact representation to the well-known Neural Tangent Kernel (NTK) and discuss approximation error relative to the NTK.
We use this exact kernel to show that our theoretical contribution can provide useful insights into the predictions made by neural networks.
arXiv Detail & Related papers (2023-08-01T20:22:53Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - Coordinate descent on the orthogonal group for recurrent neural network
training [9.886326127330337]
We show that the algorithm rotates two columns of the recurrent matrix, an operation that can be efficiently implemented as a multiplication by a Givens matrix.
Experiments on a benchmark recurrent neural network training problem are presented to demonstrate the effectiveness of the proposed algorithm.
arXiv Detail & Related papers (2021-07-30T19:27:11Z) - Double-descent curves in neural networks: a new perspective using
Gaussian processes [9.153116600213641]
Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters.
We use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent of the spectrum of the neural network Gaussian process kernel.
arXiv Detail & Related papers (2021-02-14T20:31:49Z) - Generalized Leverage Score Sampling for Neural Networks [82.95180314408205]
Leverage score sampling is a powerful technique that originates from theoretical computer science.
In this work, we generalize the results in [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17] to a broader class of kernels.
arXiv Detail & Related papers (2020-09-21T14:46:01Z) - RicciNets: Curvature-guided Pruning of High-performance Neural Networks
Using Ricci Flow [0.0]
We use the definition of Ricci curvature to remove edges of low importance before mapping the computational graph to a neural network.
We show a reduction of almost $35%$ in the number of floating-point operations (FLOPs) per pass, with no degradation in performance.
arXiv Detail & Related papers (2020-07-08T15:56:02Z) - Tractable Approximate Gaussian Inference for Bayesian Neural Networks [1.933681537640272]
We propose an analytical method for performing tractable approximate Gaussian inference (TAGI) in Bayesian neural networks.
The method has a computational complexity of $mathcalO(n)$ with respect to the number of parameters $n$, and the tests performed on regression and classification benchmarks confirm that, for a same network architecture, it matches the performance of existing methods relying on gradient backpropagation.
arXiv Detail & Related papers (2020-04-20T13:37:08Z) - Controllable Orthogonalization in Training DNNs [96.1365404059924]
Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1.
This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI)
We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction.
We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (
arXiv Detail & Related papers (2020-04-02T10:14:27Z) - On the Convex Behavior of Deep Neural Networks in Relation to the
Layers' Width [99.24399270311069]
We observe that for wider networks, minimizing the loss with the descent optimization maneuvers through surfaces of positive curvatures at the start and end of training, and close to zero curvatures in between.
In other words, it seems that during crucial parts of the training process, the Hessian in wide networks is dominated by the component G.
arXiv Detail & Related papers (2020-01-14T16:30:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.