Two-level overlapping additive Schwarz preconditioner for training scientific machine learning applications
- URL: http://arxiv.org/abs/2406.10997v1
- Date: Sun, 16 Jun 2024 16:18:45 GMT
- Title: Two-level overlapping additive Schwarz preconditioner for training scientific machine learning applications
- Authors: Youngkyu Lee, Alena Kopaničáková, George Em Karniadakis,
- Abstract summary: We introduce a novel two-level overlapping Schwarz preconditioner for accelerating the training of scientific machine learning applications.
The design of the proposed preconditioner is motivated by the nonlinear two-level overlapping Schwarz preconditioner.
We demonstrate that the proposed two-level preconditioner significantly speeds up the convergence of the standard (LBS) while also yielding more accurate machine learning models.
- Score: 1.8434042562191815
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We introduce a novel two-level overlapping additive Schwarz preconditioner for accelerating the training of scientific machine learning applications. The design of the proposed preconditioner is motivated by the nonlinear two-level overlapping additive Schwarz preconditioner. The neural network parameters are decomposed into groups (subdomains) with overlapping regions. In addition, the network's feed-forward structure is indirectly imposed through a novel subdomain-wise synchronization strategy and a coarse-level training step. Through a series of numerical experiments, which consider physics-informed neural networks and operator learning approaches, we demonstrate that the proposed two-level preconditioner significantly speeds up the convergence of the standard (LBFGS) optimizer while also yielding more accurate machine learning models. Moreover, the devised preconditioner is designed to take advantage of model-parallel computations, which can further reduce the training time.
Related papers
- A New Self-organizing Interval Type-2 Fuzzy Neural Network for Multi-Step Time Series Prediction [9.546043411729206]
This paper proposes a new self-organizing interval type-2 fuzzy neural network with multiple outputs (SOIT2FNN-MO) for multi-step time series prediction.
A nine-layer network is developed to improve prediction accuracy, uncertainty handling and model interpretability.
arXiv Detail & Related papers (2024-07-10T19:35:44Z) - Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Training Neural Networks from Scratch with Parallel Low-Rank Adapters [50.171622511923474]
We introduce LoRA-the-Explorer (LTE), a novel bi-level optimization algorithm designed to enable parallel training of multiple low-rank heads across computing nodes.
Our approach includes extensive experimentation on vision transformers using various vision datasets, demonstrating that LTE is competitive with standard pre-training.
arXiv Detail & Related papers (2024-02-26T18:55:13Z) - Enhancing training of physics-informed neural networks using
domain-decomposition based preconditioning strategies [1.8434042562191815]
We introduce additive and multiplicative preconditioning strategies for the widely used L-BFGS.
We demonstrate that both additive and multiplicative preconditioners significantly improve the convergence of the standard L-BFGS.
arXiv Detail & Related papers (2023-06-30T13:35:09Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Layerwise Sparsifying Training and Sequential Learning Strategy for
Neural Architecture Adaptation [0.0]
This work presents a two-stage framework for developing neural architectures to adapt/ generalize well on a given training data set.
In the first stage, a manifold-regularized layerwise sparsifying training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers.
In the second stage, a sequential learning process is adopted where a sequence of small networks is employed to extract information from the residual produced in stage I.
arXiv Detail & Related papers (2022-11-13T09:51:16Z) - On the Interpretability of Regularisation for Neural Networks Through
Model Gradient Similarity [0.0]
Model Gradient Similarity (MGS) serves as a metric of regularisation.
MGS provides the basis for a new regularisation scheme which exhibits excellent performance.
arXiv Detail & Related papers (2022-05-25T10:38:33Z) - Identification of Nonlinear Dynamic Systems Using Type-2 Fuzzy Neural
Networks -- A Novel Learning Algorithm and a Comparative Study [12.77304082363491]
A sliding mode theory-based learning algorithm has been proposed to tune both the premise and consequent parts of type-2 fuzzy neural networks.
The stability of the proposed learning algorithm has been proved by using an appropriate Lyapunov function.
Several comparisons have been realized and shown that the proposed algorithm has faster convergence speed than the existing methods.
arXiv Detail & Related papers (2021-04-04T23:44:59Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.