Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime
- URL: http://arxiv.org/abs/2405.15254v1
- Date: Fri, 24 May 2024 06:30:36 GMT
- Title: Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime
- Authors: Alistair Shilton, Sunil Gupta, Santu Rana, Svetha Venkatesh,
- Abstract summary: This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
- Score: 52.00917519626559
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology, assuming only finite-energy neural activations; and a novel representor theory for neural networks in terms of a matrix-valued kernel. The first model is exact (un-approximated) and global, casting the neural network as an elements in a reproducing kernel Banach space (RKBS); we use this model to provide tight bounds on Rademacher complexity. The second model is exact and local, casting the change in neural network function resulting from a bounded change in weights and biases (ie. a training step) in reproducing kernel Hilbert space (RKHS) in terms of a local-intrinsic neural kernel (LiNK). This local model provides insight into model adaptation through tight bounds on Rademacher complexity of network adaptation. We also prove that the neural tangent kernel (NTK) is a first-order approximation of the LiNK kernel. Finally, and noting that the LiNK does not provide a representor theory for technical reasons, we present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK). This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models. Throughout the paper (a) feedforward ReLU networks and (b) residual networks (ResNet) are used as illustrative examples.
Related papers
- A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models [13.283281356356161]
We review the literature on statistical theories of neural networks from three perspectives.
Results on excess risks for neural networks are reviewed.
Papers that attempt to answer how the neural network finds the solution that can generalize well on unseen data'' are reviewed.
arXiv Detail & Related papers (2024-01-14T02:30:19Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Why Quantization Improves Generalization: NTK of Binary Weight Neural
Networks [33.08636537654596]
We take the binary weights in a neural network as random variables under rounding, and study the distribution propagation over different layers in the neural network.
We propose a quasi neural network to approximate the distribution propagation, which is a neural network with continuous parameters and smooth activation function.
arXiv Detail & Related papers (2022-06-13T06:11:21Z) - A Kernel-Expanded Stochastic Neural Network [10.837308632004644]
Deep neural network often gets trapped into a local minimum in training.
New kernel-expanded neural network (K-StoNet) model reformulates the network as a latent variable model.
Model can be easily trained using the imputationregularized optimization (IRO) algorithm.
arXiv Detail & Related papers (2022-01-14T06:42:42Z) - Deep Kronecker neural networks: A general framework for neural networks
with adaptive activation functions [4.932130498861987]
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions.
Under suitable conditions, KNNs induce a faster decay of the loss than that by the feed-forward networks.
arXiv Detail & Related papers (2021-05-20T04:54:57Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.