Related papers: Nonlocal Neural Tangent Kernels via Parameter-Space Interactions

Nonlocal Neural Tangent Kernels via Parameter-Space Interactions

URL: http://arxiv.org/abs/2509.12467v1
Date: Mon, 15 Sep 2025 21:23:47 GMT
Title: Nonlocal Neural Tangent Kernels via Parameter-Space Interactions
Authors: Sriram Nagaraj, Vishakh Hari,
Abstract summary: The Neural Tangent Kernel (NTK) has provided insights into the training dynamics of neural networks under gradient flow.<n>We propose a Nonlocal Neural Tangent Kernel (NNTK) that replaces the local gradient with a nonlocal interaction-based approximation in parameter space.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Neural Tangent Kernel (NTK) framework has provided deep insights into the training dynamics of neural networks under gradient flow. However, it relies on the assumption that the network is differentiable with respect to its parameters, an assumption that breaks down when considering non-smooth target functions or parameterized models exhibiting non-differentiable behavior. In this work, we propose a Nonlocal Neural Tangent Kernel (NNTK) that replaces the local gradient with a nonlocal interaction-based approximation in parameter space. Nonlocal gradients are known to exist for a wider class of functions than the standard gradient. This allows NTK theory to be extended to nonsmooth functions, stochastic estimators, and broader families of models. We explore both fixed-kernel and attention-based formulations of this nonlocal operator. We illustrate the new formulation with numerical studies.

Related papers

Convergence analysis of wide shallow neural operators within the framework of Neural Tangent Kernel [4.313136216120379]
We conduct the convergence analysis of gradient descent for the wide shallow neural operators and physics-informed shallow neural operators within the framework of Neural Tangent Kernel (NTK)<n>Under the setting of over-parametrization, gradient descent can find the global minimum regardless of whether it is in continuous time or discrete time.
arXiv Detail & Related papers (2024-12-07T05:47:28Z)
BrowNNe: Brownian Nonlocal Neurons & Activation Functions [0.0]
We show that Brownian neural activation functions in low-training data beats the ReLU counterpart. Our experiments indicate the superior capabilities of Brownian neural activation functions in low-training data.
arXiv Detail & Related papers (2024-06-21T19:40:30Z)
Neural Operators with Localized Integral and Differential Kernels [77.76991758980003]
We present a principled approach to operator learning that can capture local features under two frameworks. We prove that we obtain differential operators under an appropriate scaling of the kernel values of CNNs. To obtain local integral operators, we utilize suitable basis representations for the kernels based on discrete-continuous convolutions.
arXiv Detail & Related papers (2024-02-26T18:59:31Z)
Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum [18.10812063219831]
We introduce Modified Spectrum Kernels (MSKs) to approximate kernels with desired eigenvalues. We propose a preconditioned gradient descent method, which alters the trajectory of gradient descent. Our method is both computationally efficient and simple to implement.
arXiv Detail & Related papers (2023-07-26T22:39:47Z)
Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood. These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Learning Discretized Neural Networks under Ricci Flow [48.47315844022283]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations.<n>DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z)
Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning [18.445445525911847]
We consider optimisation of wide, shallow neural networks, where the output of each hidden node is scaled by a positive parameter.<n>We prove that for large such neural networks, with high probability, gradient flow and gradient descent converge to a global minimum and can learn features in some sense, unlike in the NTK parameterisation.
arXiv Detail & Related papers (2023-02-02T10:40:06Z)
Nonlocal Kernel Network (NKN): a Stable and Resolution-Independent Deep Neural Network [23.465930256410722]
Nonlocal kernel network (NKN) is resolution independent, characterized by deep neural networks. NKN is capable of handling a variety of tasks such as learning governing equations and classifying images.
arXiv Detail & Related papers (2022-01-06T19:19:35Z)
Neural Operator: Learning Maps Between Function Spaces [75.93843876663128]
We propose a generalization of neural networks to learn operators, termed neural operators, that map between infinite dimensional function spaces. We prove a universal approximation theorem for our proposed neural operator, showing that it can approximate any given nonlinear continuous operator. An important application for neural operators is learning surrogate maps for the solution operators of partial differential equations.
arXiv Detail & Related papers (2021-08-19T03:56:49Z)
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate. We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior. This implies that the training loss converges linearly up to a certain accuracy. We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.