Towards Size-Independent Generalization Bounds for Deep Operator Nets
- URL: http://arxiv.org/abs/2205.11359v3
- Date: Wed, 04 Dec 2024 17:37:38 GMT
- Title: Towards Size-Independent Generalization Bounds for Deep Operator Nets
- Authors: Pulkit Gopalani, Sayar Karmakar, Dibyakanti Kumar, Anirbit Mukherjee,
- Abstract summary: This work aims to advance the theory of measuring out-of-sample error while training DeepONets.<n>For a class of DeepONets, we prove a bound on their Rademacher complexity which does not explicitly scale with the width of the nets involved.<n>We show how the Huber loss can be chosen so that for these DeepONet classes generalization error bounds can be obtained that have no explicit dependence on the size of the nets.
- Score: 0.28123958518740544
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent times machine learning methods have made significant advances in becoming a useful tool for analyzing physical systems. A particularly active area in this theme has been "physics-informed machine learning" which focuses on using neural nets for numerically solving differential equations. In this work, we aim to advance the theory of measuring out-of-sample error while training DeepONets - which is among the most versatile ways to solve P.D.E systems in one-shot. Firstly, for a class of DeepONets, we prove a bound on their Rademacher complexity which does not explicitly scale with the width of the nets involved. Secondly, we use this to show how the Huber loss can be chosen so that for these DeepONet classes generalization error bounds can be obtained that have no explicit dependence on the size of the nets. The effective capacity measure for DeepONets that we thus derive is also shown to correlate with the behavior of generalization error in experiments.
Related papers
- Generalization Analysis for Deep Contrastive Representation Learning [32.56004424242989]
We present bounds for the unsupervised risk in the Deep Contrastive Representation Learning framework.
We use loss augmentation techniques to reduce the dependency on matrix norms and the implicit dependence on network depth.
arXiv Detail & Related papers (2024-12-16T17:40:05Z) - DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning [63.5925701087252]
We introduce DimOL (Dimension-aware Operator Learning), drawing insights from dimensional analysis.
To implement DimOL, we propose the ProdLayer, which can be seamlessly integrated into FNO-based and Transformer-based PDE solvers.
Empirically, DimOL models achieve up to 48% performance gain within the PDE datasets.
arXiv Detail & Related papers (2024-10-08T10:48:50Z) - DeepONet for Solving PDEs: Generalization Analysis in Sobolev Training [2.44755919161855]
We investigate the application of operator learning, specifically DeepONet, to solve partial differential equations (PDEs)
We focus on the performance of DeepONet in Sobolev training, addressing two key questions: the approximation ability of deep branch and trunk networks, and the generalization error in Sobolev norms.
arXiv Detail & Related papers (2024-10-06T03:43:56Z) - Separable DeepONet: Breaking the Curse of Dimensionality in Physics-Informed Machine Learning [0.0]
In the absence of labeled datasets, we utilize the PDE residual loss to learn the physical system, an approach known as physics-informed DeepONet.
This method faces significant computational challenges, primarily due to the curse of dimensionality, as the computational cost increases exponentially with finer discretization.
We introduce the Separable DeepONet framework to address these challenges and improve scalability for high-dimensional PDEs.
arXiv Detail & Related papers (2024-07-21T16:33:56Z) - On the Role of Initialization on the Implicit Bias in Deep Linear
Networks [8.272491066698041]
This study focuses on exploring the phenomenon attributed to the implicit bias at play.
Various sources of implicit bias have been identified, such as step size, weight initialization, optimization algorithm, and number of parameters.
arXiv Detail & Related papers (2024-02-04T11:54:07Z) - Deep Equilibrium Based Neural Operators for Steady-State PDEs [100.88355782126098]
We study the benefits of weight-tied neural network architectures for steady-state PDEs.
We propose FNO-DEQ, a deep equilibrium variant of the FNO architecture that directly solves for the solution of a steady-state PDE.
arXiv Detail & Related papers (2023-11-30T22:34:57Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Deep Operator Learning Lessens the Curse of Dimensionality for PDEs [11.181533339111853]
This paper provides an estimate for the generalization error of learning Lipschitz operators over Banach spaces using DNNs with applications to various PDE solution operators.
Under mild assumptions on data distributions or operator structures, our analysis shows that deep operator learning can have a relaxed dependence on the discretization resolution of PDEs.
arXiv Detail & Related papers (2023-01-28T15:35:52Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - PAC-Bayes Compression Bounds So Tight That They Can Explain
Generalization [48.26492774959634]
We develop a compression approach based on quantizing neural network parameters in a linear subspace.
We find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor.
arXiv Detail & Related papers (2022-11-24T13:50:16Z) - Improved architectures and training algorithms for deep operator
networks [0.0]
Operator learning techniques have emerged as a powerful tool for learning maps between infinite-dimensional Banach spaces.
We analyze the training dynamics of deep operator networks (DeepONets) through the lens of Neural Tangent Kernel (NTK) theory.
arXiv Detail & Related papers (2021-10-04T18:34:41Z) - Towards Interpretable Deep Networks for Monocular Depth Estimation [78.84690613778739]
We quantify the interpretability of a deep MDE network by the depth selectivity of its hidden units.
We propose a method to train interpretable MDE deep networks without changing their original architectures.
Experimental results demonstrate that our method is able to enhance the interpretability of deep MDE networks.
arXiv Detail & Related papers (2021-08-11T16:43:45Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Learning the solution operator of parametric partial differential
equations with physics-informed DeepOnets [0.0]
Deep operator networks (DeepONets) are receiving increased attention thanks to their demonstrated capability to approximate nonlinear operators between infinite-dimensional Banach spaces.
We propose a novel model class coined as physics-informed DeepONets, which introduces an effective regularization mechanism for biasing the outputs of DeepOnet models towards ensuring physical consistency.
We demonstrate that this simple, yet remarkably effective extension can not only yield a significant improvement in the predictive accuracy of DeepOnets, but also greatly reduce the need for large training data-sets.
arXiv Detail & Related papers (2021-03-19T18:15:42Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.