Related papers: Two-level overlapping additive Schwarz preconditioner for training scientific machine learning applications

Two-level overlapping additive Schwarz preconditioner for training scientific machine learning applications

URL: http://arxiv.org/abs/2406.10997v1
Date: Sun, 16 Jun 2024 16:18:45 GMT
Title: Two-level overlapping additive Schwarz preconditioner for training scientific machine learning applications
Authors: Youngkyu Lee, Alena Kopaničáková, George Em Karniadakis,
Abstract summary: We introduce a novel two-level overlapping Schwarz preconditioner for accelerating the training of scientific machine learning applications. The design of the proposed preconditioner is motivated by the nonlinear two-level overlapping Schwarz preconditioner. We demonstrate that the proposed two-level preconditioner significantly speeds up the convergence of the standard (LBS) while also yielding more accurate machine learning models.
Score: 1.8434042562191815
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce a novel two-level overlapping additive Schwarz preconditioner for accelerating the training of scientific machine learning applications. The design of the proposed preconditioner is motivated by the nonlinear two-level overlapping additive Schwarz preconditioner. The neural network parameters are decomposed into groups (subdomains) with overlapping regions. In addition, the network's feed-forward structure is indirectly imposed through a novel subdomain-wise synchronization strategy and a coarse-level training step. Through a series of numerical experiments, which consider physics-informed neural networks and operator learning approaches, we demonstrate that the proposed two-level preconditioner significantly speeds up the convergence of the standard (LBFGS) optimizer while also yielding more accurate machine learning models. Moreover, the devised preconditioner is designed to take advantage of model-parallel computations, which can further reduce the training time.

Related papers

A First-order Generative Bilevel Optimization Framework for Diffusion Models [57.40597004445473]
Diffusion models iteratively denoise data samples to synthesize high-quality outputs. Traditional bilevel methods fail due to infinite-dimensional probability space and prohibitive sampling costs. We formalize this challenge as a generative bilevel optimization problem. Our first-order bilevel framework overcomes the incompatibility of conventional bilevel methods with diffusion processes.
arXiv Detail & Related papers (2025-02-12T21:44:06Z)
A New Self-organizing Interval Type-2 Fuzzy Neural Network for Multi-Step Time Series Prediction [9.546043411729206]
This paper proposes a new self-organizing interval type-2 fuzzy neural network with multiple outputs (SOIT2FNN-MO) for multi-step time series prediction. A nine-layer network is developed to improve prediction accuracy, uncertainty handling and model interpretability.
arXiv Detail & Related papers (2024-07-10T19:35:44Z)
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z)
Enhancing training of physics-informed neural networks using domain-decomposition based preconditioning strategies [1.8434042562191815]
We introduce additive and multiplicative preconditioning strategies for the widely used L-BFGS. We demonstrate that both additive and multiplicative preconditioners significantly improve the convergence of the standard L-BFGS.
arXiv Detail & Related papers (2023-06-30T13:35:09Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum. Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels. They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z)
An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture [0.0]
This work presents a two-stage adaptive framework for developing deep neural network (DNN) architectures that generalize well for a given training data set. In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers. We introduce a epsilon-delta stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a epsilon-delta stability-promoting algorithm.
arXiv Detail & Related papers (2022-11-13T09:51:16Z)
On the Interpretability of Regularisation for Neural Networks Through Model Gradient Similarity [0.0]
Model Gradient Similarity (MGS) serves as a metric of regularisation. MGS provides the basis for a new regularisation scheme which exhibits excellent performance.
arXiv Detail & Related papers (2022-05-25T10:38:33Z)
Initialization and Regularization of Factorized Neural Layers [23.875225732697142]
We show how to initialize and regularize Factorized layers in deep nets. We show how these schemes lead to improved performance on both translation and unsupervised pre-training.
arXiv Detail & Related papers (2021-05-03T17:28:07Z)
Identification of Nonlinear Dynamic Systems Using Type-2 Fuzzy Neural Networks -- A Novel Learning Algorithm and a Comparative Study [12.77304082363491]
A sliding mode theory-based learning algorithm has been proposed to tune both the premise and consequent parts of type-2 fuzzy neural networks. The stability of the proposed learning algorithm has been proved by using an appropriate Lyapunov function. Several comparisons have been realized and shown that the proposed algorithm has faster convergence speed than the existing methods.
arXiv Detail & Related papers (2021-04-04T23:44:59Z)
LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z)
A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z)
Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications. In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution. Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.