Astral: training physics-informed neural networks with error majorants
- URL: http://arxiv.org/abs/2406.02645v1
- Date: Tue, 4 Jun 2024 13:11:49 GMT
- Title: Astral: training physics-informed neural networks with error majorants
- Authors: Vladimir Fanaskov, Tianchi Yu, Alexander Rudikov, Ivan Oseledets,
- Abstract summary: We argue that residual is, at best, an indirect measure of the error of approximate solution.
Since error majorant provides a direct upper bound on error, one can reliably estimate how close PiNN is to the exact solution.
- Score: 45.24347017854392
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The primal approach to physics-informed learning is a residual minimization. We argue that residual is, at best, an indirect measure of the error of approximate solution and propose to train with error majorant instead. Since error majorant provides a direct upper bound on error, one can reliably estimate how close PiNN is to the exact solution and stop the optimization process when the desired accuracy is reached. We call loss function associated with error majorant $\textbf{Astral}$: neur$\textbf{A}$l a po$\textbf{ST}$erio$\textbf{RI}$ function$\textbf{A}$l Loss. To compare Astral and residual loss functions, we illustrate how error majorants can be derived for various PDEs and conduct experiments with diffusion equations (including anisotropic and in the L-shaped domain), convection-diffusion equation, temporal discretization of Maxwell's equation, and magnetostatics problem. The results indicate that Astral loss is competitive to the residual loss, typically leading to faster convergence and lower error (e.g., for Maxwell's equations, we observe an order of magnitude better relative error and training time). We also report that the error estimate obtained with Astral loss is usually tight enough to be informative, e.g., for a highly anisotropic equation, on average, Astral overestimates error by a factor of $1.5$, and for convection-diffusion by a factor of $1.7$.
Related papers
- Trotter error time scaling separation via commutant decomposition [6.418044102466421]
Suppressing the Trotter error in dynamical quantum simulation typically requires running deeper circuits.
We introduce a general framework of commutant decomposition that separates disjoint error components that have fundamentally different scaling with time.
We show that this formalism not only straightforwardly reproduces previous results but also provides a better error estimate for higher-order product formulas.
arXiv Detail & Related papers (2024-09-25T05:25:50Z) - Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.
We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.
Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Loss Spike in Training Neural Networks [9.848777377317901]
We investigate the mechanism underlying loss spikes observed during neural network training.
From a frequency perspective, we explain the rapid descent in loss as being primarily influenced by low-frequency components.
We experimentally observe that loss spikes can facilitate condensation, causing input weights to evolve towards the same direction.
arXiv Detail & Related papers (2023-05-20T07:57:15Z) - Neural Inference of Gaussian Processes for Time Series Data of Quasars [72.79083473275742]
We introduce a new model that enables it to describe quasar spectra completely.
We also introduce a new method of inference of Gaussian process parameters, which we call $textitNeural Inference$.
The combination of both the CDRW model and Neural Inference significantly outperforms the baseline DRW and MLE.
arXiv Detail & Related papers (2022-11-17T13:01:26Z) - Physics-Informed Neural Network Method for Parabolic Differential
Equations with Sharply Perturbed Initial Conditions [68.8204255655161]
We develop a physics-informed neural network (PINN) model for parabolic problems with a sharply perturbed initial condition.
Localized large gradients in the ADE solution make the (common in PINN) Latin hypercube sampling of the equation's residual highly inefficient.
We propose criteria for weights in the loss function that produce a more accurate PINN solution than those obtained with the weights selected via other methods.
arXiv Detail & Related papers (2022-08-18T05:00:24Z) - Is $L^2$ Physics-Informed Loss Always Suitable for Training
Physics-Informed Neural Network? [28.458641579364457]
The $L2$ Physics-Informed Loss is the de-facto standard in training Physics-Informed Neural Networks.
We develop a novel PINN training to minimize the $Linfty$ loss for HJB equations which is in a similar spirit to adversarial training.
arXiv Detail & Related papers (2022-06-04T15:48:17Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - $\sigma^2$R Loss: a Weighted Loss by Multiplicative Factors using
Sigmoidal Functions [0.9569316316728905]
We introduce a new loss function called squared reduction loss ($sigma2$R loss), which is regulated by a sigmoid function to inflate/deflate the error per instance.
Our loss has clear intuition and geometric interpretation, we demonstrate by experiments the effectiveness of our proposal.
arXiv Detail & Related papers (2020-09-18T12:34:40Z) - Large-time asymptotics in deep learning [0.0]
We consider the impact of the final time $T$ (which may indicate the depth of a corresponding ResNet) in training.
For the classical $L2$--regularized empirical risk minimization problem, we show that the training error is at most of the order $mathcalOleft(frac1Tright)$.
In the setting of $ellp$--distance losses, we prove that both the training error and the optimal parameters are at most of the order $mathcalOleft(e-mu
arXiv Detail & Related papers (2020-08-06T07:33:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.