Related papers: Loss Jump During Loss Switch in Solving PDEs with Neural Networks

Loss Jump During Loss Switch in Solving PDEs with Neural Networks

URL: http://arxiv.org/abs/2405.03095v1
Date: Mon, 6 May 2024 01:18:36 GMT
Title: Loss Jump During Loss Switch in Solving PDEs with Neural Networks
Authors: Zhiwei Wang, Lulu Zhang, Zhongwang Zhang, Zhi-Qin John Xu,
Abstract summary: Using neural networks to solve partial differential equations (PDEs) is gaining popularity as an alternative approach in the scientific computing community. This work focuses on investigating how different loss functions impact the training of neural networks for solving PDEs.
Score: 11.123662745891677
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Using neural networks to solve partial differential equations (PDEs) is gaining popularity as an alternative approach in the scientific computing community. Neural networks can integrate different types of information into the loss function. These include observation data, governing equations, and variational forms, etc. These loss functions can be broadly categorized into two types: observation data loss directly constrains and measures the model output, while other loss functions indirectly model the performance of the network, which can be classified as model loss. However, this alternative approach lacks a thorough understanding of its underlying mechanisms, including theoretical foundations and rigorous characterization of various phenomena. This work focuses on investigating how different loss functions impact the training of neural networks for solving PDEs. We discover a stable loss-jump phenomenon: when switching the loss function from the data loss to the model loss, which includes different orders of derivative information, the neural network solution significantly deviates from the exact solution immediately. Further experiments reveal that this phenomenon arises from the different frequency preferences of neural networks under different loss functions. We theoretically analyze the frequency preference of neural networks under model loss. This loss-jump phenomenon provides a valuable perspective for examining the underlying mechanisms of neural networks in solving PDEs.

Related papers

Automatic Differentiation is Essential in Training Neural Networks for Solving Differential Equations [7.890817997914349]
Neural network-based approaches have recently shown significant promise in solving partial differential equations (PDEs) in science and engineering. One advantage of the neural network methods for PDEs lies in its automatic differentiation (AD) In this paper, we quantitatively demonstrate the advantage of AD in training neural networks.
arXiv Detail & Related papers (2024-05-23T02:01:05Z)
Neural Loss Function Evolution for Large-Scale Image Classifier Convolutional Neural Networks [0.0]
For classification, neural networks learn by minimizing cross-entropy, but are evaluated and compared using accuracy. This disparity suggests neural loss function search (NLFS), the search for a drop-in replacement loss function of cross-entropy for neural networks. We propose a new search space for NLFS that encourages more diverse loss functions to be explored.
arXiv Detail & Related papers (2024-01-30T17:21:28Z)
Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z)
Graph Neural Networks with Trainable Adjacency Matrices for Fault Diagnosis on Multivariate Sensor Data [69.25738064847175]
It is necessary to consider the behavior of the signals in each sensor separately, to take into account their correlation and hidden relationships with each other. The graph nodes can be represented as data from the different sensors, and the edges can display the influence of these data on each other. It was proposed to construct a graph during the training of graph neural network. This allows to train models on data where the dependencies between the sensors are not known in advance.
arXiv Detail & Related papers (2022-10-20T11:03:21Z)
Critical Investigation of Failure Modes in Physics-informed Neural Networks [0.9137554315375919]
We show that a physics-informed neural network with a composite formulation produces highly non- learned loss surfaces that are difficult to optimize. We also assess the training both approaches on two elliptic problems with increasingly complex target solutions.
arXiv Detail & Related papers (2022-06-20T18:43:35Z)
Scaling Properties of Deep Residual Networks [2.6763498831034043]
We investigate the properties of weights trained by gradient descent and their scaling with network depth through numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. These findings cast doubts on the validity of the neural ODE model as an adequate description of deep ResNets.
arXiv Detail & Related papers (2021-05-25T22:31:30Z)
Conditional physics informed neural networks [85.48030573849712]
We introduce conditional PINNs (physics informed neural networks) for estimating the solution of classes of eigenvalue problems. We show that a single deep neural network can learn the solution of partial differential equations for an entire class of problems.
arXiv Detail & Related papers (2021-04-06T18:29:14Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
Vulnerability Under Adversarial Machine Learning: Bias or Variance? [77.30759061082085]
We investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network. Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation. We introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies.
arXiv Detail & Related papers (2020-08-01T00:58:54Z)
ResiliNet: Failure-Resilient Inference in Distributed Neural Networks [56.255913459850674]
We introduce ResiliNet, a scheme for making inference in distributed neural networks resilient to physical node failures. Failout simulates physical node failure conditions during training using dropout, and is specifically designed to improve the resiliency of distributed neural networks.
arXiv Detail & Related papers (2020-02-18T05:58:24Z)
Mean-Field and Kinetic Descriptions of Neural Differential Equations [0.0]
In this work we focus on a particular class of neural networks, i.e. the residual neural networks. We analyze steady states and sensitivity with respect to the parameters of the network, namely the weights and the bias. A modification of the microscopic dynamics, inspired by residual neural networks, leads to a Fokker-Planck formulation of the network.
arXiv Detail & Related papers (2020-01-07T13:41:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.