Neural signature kernels as infinite-width-depth-limits of controlled
ResNets
- URL: http://arxiv.org/abs/2303.17671v2
- Date: Sun, 4 Jun 2023 12:45:08 GMT
- Title: Neural signature kernels as infinite-width-depth-limits of controlled
ResNets
- Authors: Nicola Muca Cirone, Maud Lemercier, Cristopher Salvi
- Abstract summary: We consider randomly controlled ResNets defined as Euler-discretizations of neural controlled differential equations (Neural CDEs)
We show that in the infinite-width-depth limit and under proper scaling, these architectures converge weakly to Gaussian processes indexed on some spaces of continuous paths.
We show that in the infinite-depth regime, finite-width controlled ResNets converge in distribution to Neural CDEs with random vector fields.
- Score: 5.306881553301636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by the paradigm of reservoir computing, we consider randomly
initialized controlled ResNets defined as Euler-discretizations of neural
controlled differential equations (Neural CDEs), a unified architecture which
enconpasses both RNNs and ResNets. We show that in the infinite-width-depth
limit and under proper scaling, these architectures converge weakly to Gaussian
processes indexed on some spaces of continuous paths and with kernels
satisfying certain partial differential equations (PDEs) varying according to
the choice of activation function, extending the results of Hayou (2022); Hayou
& Yang (2023) to the controlled and homogeneous case. In the special,
homogeneous, case where the activation is the identity, we show that the
equation reduces to a linear PDE and the limiting kernel agrees with the
signature kernel of Salvi et al. (2021a). We name this new family of limiting
kernels neural signature kernels. Finally, we show that in the infinite-depth
regime, finite-width controlled ResNets converge in distribution to Neural CDEs
with random vector fields which, depending on whether the weights are shared
across layers, are either time-independent and Gaussian or behave like a
matrix-valued Brownian motion.
Related papers
- Proportional infinite-width infinite-depth limit for deep linear neural networks [0.16385815610837165]
We study the distributional properties of linear neural networks with random parameters in the context of large networks, where the number of layers diverges in proportion to the number of neurons per layer.
We explore the joint proportional limit in which both depth and width diverge but maintain a constant ratio, yielding a non-Gaussian distribution that retains correlations between outputs.
arXiv Detail & Related papers (2024-11-22T11:25:52Z) - Deep Kernel Posterior Learning under Infinite Variance Prior Weights [1.5960546024967326]
We show that a Bayesian deep neural network converges to a process with $alpha$-stable marginals in each layer that has a conditionally Gaussian representation.
We also provide useful generalizations of the results of Lor'ia & Bhadra (2024) on shallow networks on shallow multi-layer networks.
The computational and statistical benefits over competing approaches stand out in simulations and in demonstrations on benchmark data sets.
arXiv Detail & Related papers (2024-10-02T07:13:17Z) - Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - Deep neural networks with dependent weights: Gaussian Process mixture
limit, heavy tails, sparsity and compressibility [18.531464406721412]
This article studies the infinite-width limit of deep feedforward neural networks whose weights are dependent.
Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node.
arXiv Detail & Related papers (2022-05-17T09:14:32Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite
Networks [12.692279981822011]
We derive the covariance functions of multi-layer perceptrons with exponential linear units (ELU) and Gaussian error linear units (GELU)
We analyse the fixed-point dynamics of iterated kernels corresponding to a broad range of activation functions.
We find that unlike some previously studied neural network kernels, these new kernels exhibit non-trivial fixed-point dynamics.
arXiv Detail & Related papers (2020-02-20T01:25:39Z) - On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets.
Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity.
In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.