When and why PINNs fail to train: A neural tangent kernel perspective
- URL: http://arxiv.org/abs/2007.14527v1
- Date: Tue, 28 Jul 2020 23:44:56 GMT
- Title: When and why PINNs fail to train: A neural tangent kernel perspective
- Authors: Sifan Wang, Xinling Yu, Paris Perdikaris
- Abstract summary: We derive the Neural Tangent Kernel (NTK) of PINNs and prove that, under appropriate conditions, it converges to a deterministic kernel that stays constant during training in the infinite-width limit.
We find a remarkable discrepancy in the convergence rate of the different loss components contributing to the total training error.
We propose a novel gradient descent algorithm that utilizes the eigenvalues of the NTK to adaptively calibrate the convergence rate of the total training error.
- Score: 2.1485350418225244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Physics-informed neural networks (PINNs) have lately received great attention
thanks to their flexibility in tackling a wide range of forward and inverse
problems involving partial differential equations. However, despite their
noticeable empirical success, little is known about how such constrained neural
networks behave during their training via gradient descent. More importantly,
even less is known about why such models sometimes fail to train at all. In
this work, we aim to investigate these questions through the lens of the Neural
Tangent Kernel (NTK); a kernel that captures the behavior of fully-connected
neural networks in the infinite width limit during training via gradient
descent. Specifically, we derive the NTK of PINNs and prove that, under
appropriate conditions, it converges to a deterministic kernel that stays
constant during training in the infinite-width limit. This allows us to analyze
the training dynamics of PINNs through the lens of their limiting NTK and find
a remarkable discrepancy in the convergence rate of the different loss
components contributing to the total training error. To address this
fundamental pathology, we propose a novel gradient descent algorithm that
utilizes the eigenvalues of the NTK to adaptively calibrate the convergence
rate of the total training error. Finally, we perform a series of numerical
experiments to verify the correctness of our theory and the practical
effectiveness of the proposed algorithms. The data and code accompanying this
manuscript are publicly available at
\url{https://github.com/PredictiveIntelligenceLab/PINNsNTK}.
Related papers
- Improving PINNs By Algebraic Inclusion of Boundary and Initial Conditions [0.1874930567916036]
"AI for Science" aims to solve fundamental scientific problems using AI techniques.
In this work we explore the possibility of changing the model being trained from being just a neural network to being a non-linear transformation of it.
This reduces the number of terms in the loss function than the standard PINN losses.
arXiv Detail & Related papers (2024-07-30T11:19:48Z) - Efficient kernel surrogates for neural network-based regression [0.8030359871216615]
We study the performance of the Conjugate Kernel (CK), an efficient approximation to the Neural Tangent Kernel (NTK)
We show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior.
In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively.
arXiv Detail & Related papers (2023-10-28T06:41:47Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Understanding Sparse Feature Updates in Deep Networks using Iterative
Linearisation [2.33877878310217]
We derive an iterative linearised training method as a novel empirical tool to investigate why larger and deeper networks generalise well.
We show that in a variety of cases, iterative linearised training surprisingly performs on par with standard training.
We also show that feature learning is essential for good performance.
arXiv Detail & Related papers (2022-11-22T15:34:59Z) - Convergence rates for gradient descent in the training of
overparameterized artificial neural networks with biases [3.198144010381572]
In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches.
It is still unclear why randomly gradient descent algorithms reach their limits.
arXiv Detail & Related papers (2021-02-23T18:17:47Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.