Related papers: Towards a Statistical Understanding of Neural Networks: Beyond the Neural Tangent Kernel Theories

Towards a Statistical Understanding of Neural Networks: Beyond the Neural Tangent Kernel Theories

URL: http://arxiv.org/abs/2412.18756v1
Date: Wed, 25 Dec 2024 03:03:58 GMT
Title: Towards a Statistical Understanding of Neural Networks: Beyond the Neural Tangent Kernel Theories
Authors: Haobo Zhang, Jianfa Lai, Yicheng Li, Qian Lin, Jun S. Liu,
Abstract summary: A primary advantage of neural networks lies in their feature learning characteristics.<n>We propose a new paradigm for studying feature learning and the resulting benefits in generalizability.
Score: 13.949362600389088
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A primary advantage of neural networks lies in their feature learning characteristics, which is challenging to theoretically analyze due to the complexity of their training dynamics. We propose a new paradigm for studying feature learning and the resulting benefits in generalizability. After reviewing the neural tangent kernel (NTK) theory and recent results in kernel regression, which address the generalization issue of sufficiently wide neural networks, we examine limitations and implications of the fixed kernel theory (as the NTK theory) and review recent theoretical advancements in feature learning. Moving beyond the fixed kernel/feature theory, we consider neural networks as adaptive feature models. Finally, we propose an over-parameterized Gaussian sequence model as a prototype model to study the feature learning characteristics of neural networks.

Related papers

Neural Tangent Kernel of Neural Networks with Loss Informed by Differential Operators [13.803850290216257]
We develop the NTK theory for deep neural networks with physics-informed loss. We find that, in most cases, the differential operators in the loss function do not induce a faster eigenvalue decay rate and stronger spectral bias.
arXiv Detail & Related papers (2025-03-14T02:55:13Z)
Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology. We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK) This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z)
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models [13.283281356356161]
We review the literature on statistical theories of neural networks from three perspectives. Results on excess risks for neural networks are reviewed. Papers that attempt to answer how the neural network finds the solution that can generalize well on unseen data'' are reviewed.
arXiv Detail & Related papers (2024-01-14T02:30:19Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness? [0.0]
We study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods. We show how NTKs allow to generate adversarial examples in a training-free'' fashion, and demonstrate that they transfer to fool their finite-width neural net counterparts in the lazy'' regime.
arXiv Detail & Related papers (2022-10-11T16:11:48Z)
Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp) In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks. We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z)
Gaussian Process Surrogate Models for Neural Networks [6.8304779077042515]
In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque. We construct a class of surrogate models for neural networks using Gaussian processes. We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems.
arXiv Detail & Related papers (2022-08-11T20:17:02Z)
The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical. Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training. Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs. We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z)
The Eigenlearning Framework: A Conservation Law Perspective on Kernel Regression and Wide Neural Networks [1.6519302768772166]
We derive simple closed-form estimates for the test risk and other generalization metrics of kernel ridge regression. We identify a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions.
arXiv Detail & Related papers (2021-10-08T06:32:07Z)
Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks. Centered and ensembled finite networks have reduced posterior variance. Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.