Related papers: Mathematical Foundations of Neural Tangents and Infinite-Width Networks

Mathematical Foundations of Neural Tangents and Infinite-Width Networks

URL: http://arxiv.org/abs/2512.08264v1
Date: Tue, 09 Dec 2025 05:41:40 GMT
Title: Mathematical Foundations of Neural Tangents and Infinite-Width Networks
Authors: Rachana Mysore, Preksha Girish, Kavitha Jayaram, Shrey Kumar, Preksha Girish, Shravan Sanjeev Bagal, Kavitha Jayaram, Shreya Aravind Shastry,
Abstract summary: We investigate the mathematical foundations of neural networks in the infinite-width regime through the Tangent Neural Kernel (NTK)<n>We propose the NTK-Eigenvalue-Controlled Residual Network (NTK-ECRN) to enable rigorous analysis of kernel evolution during training.<n> Empirical results on synthetic and benchmark datasets validate the predicted kernel behavior and demonstrate improved training stability and generalization.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate the mathematical foundations of neural networks in the infinite-width regime through the Neural Tangent Kernel (NTK). We propose the NTK-Eigenvalue-Controlled Residual Network (NTK-ECRN), an architecture integrating Fourier feature embeddings, residual connections with layerwise scaling, and stochastic depth to enable rigorous analysis of kernel evolution during training. Our theoretical contributions include deriving bounds on NTK dynamics, characterizing eigenvalue evolution, and linking spectral properties to generalization and optimization stability. Empirical results on synthetic and benchmark datasets validate the predicted kernel behavior and demonstrate improved training stability and generalization. This work provides a comprehensive framework bridging infinite-width theory and practical deep-learning architectures.

Related papers

Depth-induced NTK: Bridging Over-parameterized Neural Networks and Deep Neural Kernels [13.302913618949468]
We provide a principled framework to interpret over- parameterized neural networks by mapping hierarchical feature transformations into kernel spaces.<n>We propose a depth-induced NTK kernel based on a shortcut-related architecture, which converges to a Gaussian process as the network depth approaches infinity.<n>Our findings significantly extend the existing landscape of the neural kernel theory and provide an in-depth understanding of deep learning and the scaling law.
arXiv Detail & Related papers (2025-11-05T10:00:03Z)
Mathematical Modeling and Convergence Analysis of Deep Neural Networks with Dense Layer Connectivities in Deep Learning [1.5516092077598485]
In deep learning, dense layer connectivity has become a key design principle in deep neural networks (DNNs)<n>In this work, we model densely connected DNNs mathematically and analyze their learning problems in the deep-layer limit.
arXiv Detail & Related papers (2025-10-02T14:22:51Z)
Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel [55.82768375605861]
We establish a generalization bound for gradient flow that aligns with the classical Rademacher complexity for kernel methods.<n>Unlike static kernels such as NTK, the LPK captures the entire training trajectory, adapting to both data and optimization dynamics.
arXiv Detail & Related papers (2025-06-12T23:17:09Z)
Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs [0.0]
We aim to advance the theoretical understanding of cPIKANs by analyzing them using Neural Tangent Kernel (NTK) theory.<n>We first derive the NTK of standard cKANs in a supervised setting, and then extend the analysis to the physics-informed context.<n>Results indicate a tractable behavior for NTK in the context of cPIKANs, which exposes learning dynamics that standard physics-informed neural networks (PINNs) cannot capture.
arXiv Detail & Related papers (2025-06-09T17:30:13Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.<n>We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.<n>This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Towards a Statistical Understanding of Neural Networks: Beyond the Neural Tangent Kernel Theories [13.949362600389088]
A primary advantage of neural networks lies in their feature learning characteristics.<n>We propose a new paradigm for studying feature learning and the resulting benefits in generalizability.
arXiv Detail & Related papers (2024-12-25T03:03:58Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp) In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks. We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z)
On Feature Learning in Neural Networks with Global Convergence Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF) We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z)
Analysis of Structured Deep Kernel Networks [0.0]
We show that the use of special types of kernels yields models reminiscent of neural networks founded in the same theoretical framework of classical kernel methods.<n> Especially the introduced Structured Deep Kernel Networks (SDKNs) can be viewed as unbounded neural networks (NNs) with optimizable activation functions obeying a representer theorem.
arXiv Detail & Related papers (2021-05-15T14:10:35Z)
Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.