Mehler's Formula, Branching Process, and Compositional Kernels of Deep
Neural Networks
- URL: http://arxiv.org/abs/2004.04767v2
- Date: Mon, 28 Sep 2020 17:29:34 GMT
- Title: Mehler's Formula, Branching Process, and Compositional Kernels of Deep
Neural Networks
- Authors: Tengyuan Liang and Hai Tran-Bach
- Abstract summary: We utilize a connection between compositional kernels and branching processes via Mehler's formula to study deep neural networks.
We study the unscaled and rescaled limits of the compositional kernels and explore the different phases of the limiting behavior.
Explicit formulas on the eigenvalues of the compositional kernel are provided, which quantify the complexity of the corresponding kernel Hilbert space.
- Score: 3.167685495996986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We utilize a connection between compositional kernels and branching processes
via Mehler's formula to study deep neural networks. This new probabilistic
insight provides us a novel perspective on the mathematical role of activation
functions in compositional neural networks. We study the unscaled and rescaled
limits of the compositional kernels and explore the different phases of the
limiting behavior, as the compositional depth increases. We investigate the
memorization capacity of the compositional kernels and neural networks by
characterizing the interplay among compositional depth, sample size,
dimensionality, and non-linearity of the activation. Explicit formulas on the
eigenvalues of the compositional kernel are provided, which quantify the
complexity of the corresponding reproducing kernel Hilbert space. On the
methodological front, we propose a new random features algorithm, which
compresses the compositional layers by devising a new activation function.
Related papers
- Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations [24.052411316664017]
We introduce a theoretical framework for the evolution of the kernel sequence, which measures the similarity between the hidden representation for two different inputs.
For nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can correspond to similar representations depending on the activation and network architecture.
This work provides new insights into the implicit biases of deep neural networks and how architectural choices influence the evolution of representations across layers.
arXiv Detail & Related papers (2024-10-26T07:10:47Z) - Spectral complexity of deep neural networks [2.099922236065961]
We use the angular power spectrum of the limiting field to characterize the complexity of the network architecture.
On this basis, we classify neural networks as low-disorder, sparse, or high-disorder.
We show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks.
arXiv Detail & Related papers (2024-05-15T17:55:05Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - Structure Embedded Nucleus Classification for Histopathology Images [51.02953253067348]
Most neural network based methods are affected by the local receptive field of convolutions.
We propose a novel polygon-structure feature learning mechanism that transforms a nucleus contour into a sequence of points sampled in order.
Next, we convert a histopathology image into a graph structure with nuclei as nodes, and build a graph neural network to embed the spatial distribution of nuclei into their representations.
arXiv Detail & Related papers (2023-02-22T14:52:06Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Credit Assignment for Trained Neural Networks Based on Koopman Operator
Theory [3.130109807128472]
Credit assignment problem of neural networks refers to evaluating the credit of each network component to the final outputs.
This paper presents an alternative perspective of linear dynamics on dealing with the credit assignment problem for trained neural networks.
Experiments conducted on typical neural networks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-02T06:34:27Z) - Graph Convolutional Networks from the Perspective of Sheaves and the
Neural Tangent Kernel [0.0]
Graph convolutional networks are a popular class of deep neural network algorithms.
Despite their success, graph convolutional networks exhibit a number of peculiar features, including a bias towards learning oversmoothed and homophilic functions.
We propose to bridge this gap by studying the neural tangent kernel of sheaf convolutional networks.
arXiv Detail & Related papers (2022-08-19T12:46:49Z) - On the Spectral Bias of Convolutional Neural Tangent and Gaussian
Process Kernels [24.99551134153653]
We study the properties of various over-parametrized convolutional neural architectures through their respective Gaussian process and tangent neural kernels.
We show that the eigenvalues decay hierarchically, quantify the rate of decay, and derive measures that reflect the composition of hierarchical features in these networks.
arXiv Detail & Related papers (2022-03-17T11:23:18Z) - The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical.
Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training.
Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs.
We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Neural Splines: Fitting 3D Surfaces with Infinitely-Wide Neural Networks [61.07202852469595]
We present Neural Splines, a technique for 3D surface reconstruction that is based on random feature kernels arising from infinitely-wide shallow ReLU networks.
Our method achieves state-of-the-art results, outperforming recent neural network-based techniques and widely used Poisson Surface Reconstruction.
arXiv Detail & Related papers (2020-06-24T14:54:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.