Enhancing training of physics-informed neural networks using
domain-decomposition based preconditioning strategies
- URL: http://arxiv.org/abs/2306.17648v2
- Date: Thu, 28 Dec 2023 02:34:37 GMT
- Title: Enhancing training of physics-informed neural networks using
domain-decomposition based preconditioning strategies
- Authors: Alena Kopani\v{c}\'akov\'a and Hardik Kothari and George Em
Karniadakis and Rolf Krause
- Abstract summary: We introduce additive and multiplicative preconditioning strategies for the widely used L-BFGS.
We demonstrate that both additive and multiplicative preconditioners significantly improve the convergence of the standard L-BFGS.
- Score: 1.8434042562191815
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose to enhance the training of physics-informed neural networks
(PINNs). To this aim, we introduce nonlinear additive and multiplicative
preconditioning strategies for the widely used L-BFGS optimizer. The nonlinear
preconditioners are constructed by utilizing the Schwarz domain-decomposition
framework, where the parameters of the network are decomposed in a layer-wise
manner. Through a series of numerical experiments, we demonstrate that both,
additive and multiplicative preconditioners significantly improve the
convergence of the standard L-BFGS optimizer, while providing more accurate
solutions of the underlying partial differential equations. Moreover, the
additive preconditioner is inherently parallel, thus giving rise to a novel
approach to model parallelism.
Related papers
- Reimagining Linear Probing: Kolmogorov-Arnold Networks in Transfer Learning [18.69601183838834]
Kolmogorov-Arnold Networks (KAN) is an enhancement to the traditional linear probing method in transfer learning.
KAN consistently outperforms traditional linear probing, achieving significant improvements in accuracy and generalization.
arXiv Detail & Related papers (2024-09-12T05:36:40Z) - Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [3.680127959836384]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) in handling certain multi-scale problems.
We show that IGD converges a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z) - Two-level overlapping additive Schwarz preconditioner for training scientific machine learning applications [1.8434042562191815]
We introduce a novel two-level overlapping Schwarz preconditioner for accelerating the training of scientific machine learning applications.
The design of the proposed preconditioner is motivated by the nonlinear two-level overlapping Schwarz preconditioner.
We demonstrate that the proposed two-level preconditioner significantly speeds up the convergence of the standard (LBS) while also yielding more accurate machine learning models.
arXiv Detail & Related papers (2024-06-16T16:18:45Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Lifted Bregman Training of Neural Networks [28.03724379169264]
We introduce a novel mathematical formulation for the training of feed-forward neural networks with (potentially non-smooth) proximal maps as activation functions.
This formulation is based on Bregman and a key advantage is that its partial derivatives with respect to the network's parameters do not require the computation of derivatives of the network's activation functions.
We present several numerical results that demonstrate that these training approaches can be equally well or even better suited for the training of neural network-based classifiers and (denoising) autoencoders with sparse coding.
arXiv Detail & Related papers (2022-08-18T11:12:52Z) - Neural Basis Functions for Accelerating Solutions to High Mach Euler
Equations [63.8376359764052]
We propose an approach to solving partial differential equations (PDEs) using a set of neural networks.
We regress a set of neural networks onto a reduced order Proper Orthogonal Decomposition (POD) basis.
These networks are then used in combination with a branch network that ingests the parameters of the prescribed PDE to compute a reduced order approximation to the PDE.
arXiv Detail & Related papers (2022-08-02T18:27:13Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z) - Training Deep Energy-Based Models with f-Divergence Minimization [113.97274898282343]
Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging.
We propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence.
Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.
arXiv Detail & Related papers (2020-03-06T23:11:13Z) - Loss landscapes and optimization in over-parameterized non-linear
systems and neural networks [20.44438519046223]
We show that wide neural networks satisfy the PL$*$ condition, which explains the (S)GD convergence to a global minimum.
We show that wide neural networks satisfy the PL$*$ condition, which explains the (S)GD convergence to a global minimum.
arXiv Detail & Related papers (2020-02-29T17:18:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.