On the training and generalization of deep operator networks
- URL: http://arxiv.org/abs/2309.01020v1
- Date: Sat, 2 Sep 2023 21:10:45 GMT
- Title: On the training and generalization of deep operator networks
- Authors: Sanghyun Lee, Yeonjong Shin
- Abstract summary: We present a novel training method for deep operator networks (DeepONets)
DeepONets are constructed by two sub-networks.
We establish the width error estimate in terms of input data.
- Score: 11.159056906971983
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a novel training method for deep operator networks (DeepONets),
one of the most popular neural network models for operators. DeepONets are
constructed by two sub-networks, namely the branch and trunk networks.
Typically, the two sub-networks are trained simultaneously, which amounts to
solving a complex optimization problem in a high dimensional space. In
addition, the nonconvex and nonlinear nature makes training very challenging.
To tackle such a challenge, we propose a two-step training method that trains
the trunk network first and then sequentially trains the branch network. The
core mechanism is motivated by the divide-and-conquer paradigm and is the
decomposition of the entire complex training task into two subtasks with
reduced complexity. Therein the Gram-Schmidt orthonormalization process is
introduced which significantly improves stability and generalization ability.
On the theoretical side, we establish a generalization error estimate in terms
of the number of training data, the width of DeepONets, and the number of input
and output sensors. Numerical examples are presented to demonstrate the
effectiveness of the two-step training method, including Darcy flow in
heterogeneous porous media.
Related papers
- Layerwise Sparsifying Training and Sequential Learning Strategy for
Neural Architecture Adaptation [0.0]
This work presents a two-stage framework for developing neural architectures to adapt/ generalize well on a given training data set.
In the first stage, a manifold-regularized layerwise sparsifying training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers.
In the second stage, a sequential learning process is adopted where a sequence of small networks is employed to extract information from the residual produced in stage I.
arXiv Detail & Related papers (2022-11-13T09:51:16Z) - Globally Gated Deep Linear Networks [3.04585143845864]
We introduce Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer.
We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit.
Our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width.
arXiv Detail & Related papers (2022-10-31T16:21:56Z) - MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for
Nonlinear Dimension Reduction, Uncertainty Quantification and Operator
Learning of Forward and Inverse Stochastic Problems [12.826754199680474]
A new data-driven method for operator learning of differential equations(SDE) is proposed in this paper.
The central goal is to solve forward and inverse problems more effectively using limited data.
arXiv Detail & Related papers (2022-04-07T03:53:49Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Optimizing Neural Networks via Koopman Operator Theory [6.09170287691728]
Koopman operator theory was recently shown to be intimately connected with neural network theory.
In this work we take the first steps in making use of this connection.
We show that Koopman operator theory methods allow predictions of weights and biases of feed weights over a non-trivial range of training time.
arXiv Detail & Related papers (2020-06-03T16:23:07Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Exploring the Connection Between Binary and Spiking Neural Networks [1.329054857829016]
We bridge the recent algorithmic progress in training Binary Neural Networks and Spiking Neural Networks.
We show that training Spiking Neural Networks in the extreme quantization regime results in near full precision accuracies on large-scale datasets.
arXiv Detail & Related papers (2020-02-24T03:46:51Z) - Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data.
We propose to speed up this process by exploiting subsets of training data at each incremental training step.
Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.