Towards Size-Independent Generalization Bounds for Deep Operator Nets
- URL: http://arxiv.org/abs/2205.11359v3
- Date: Wed, 04 Dec 2024 17:37:38 GMT
- Title: Towards Size-Independent Generalization Bounds for Deep Operator Nets
- Authors: Pulkit Gopalani, Sayar Karmakar, Dibyakanti Kumar, Anirbit Mukherjee,
- Abstract summary: This work aims to advance the theory of measuring out-of-sample error while training DeepONets.
For a class of DeepONets, we prove a bound on their Rademacher complexity which does not explicitly scale with the width of the nets involved.
We show how the Huber loss can be chosen so that for these DeepONet classes generalization error bounds can be obtained that have no explicit dependence on the size of the nets.
- Score: 0.28123958518740544
- License:
- Abstract: In recent times machine learning methods have made significant advances in becoming a useful tool for analyzing physical systems. A particularly active area in this theme has been "physics-informed machine learning" which focuses on using neural nets for numerically solving differential equations. In this work, we aim to advance the theory of measuring out-of-sample error while training DeepONets - which is among the most versatile ways to solve P.D.E systems in one-shot. Firstly, for a class of DeepONets, we prove a bound on their Rademacher complexity which does not explicitly scale with the width of the nets involved. Secondly, we use this to show how the Huber loss can be chosen so that for these DeepONet classes generalization error bounds can be obtained that have no explicit dependence on the size of the nets. The effective capacity measure for DeepONets that we thus derive is also shown to correlate with the behavior of generalization error in experiments.
Related papers
- Generalization Analysis for Deep Contrastive Representation Learning [32.56004424242989]
We present bounds for the unsupervised risk in the Deep Contrastive Representation Learning framework.
We use loss augmentation techniques to reduce the dependency on matrix norms and the implicit dependence on network depth.
arXiv Detail & Related papers (2024-12-16T17:40:05Z) - DeepONet for Solving Nonlinear Partial Differential Equations with Physics-Informed Training [2.44755919161855]
We investigate the use of operator learning, specifically DeepONet, for solving nonlinear partial differential equations (PDEs)
This study examines the performance of DeepONet in physics-informed training, focusing on two key aspects: (1) the approximation capabilities of deep branch and trunk networks, and (2) the generalization error in Sobolev norms.
arXiv Detail & Related papers (2024-10-06T03:43:56Z) - On the Role of Initialization on the Implicit Bias in Deep Linear
Networks [8.272491066698041]
This study focuses on exploring the phenomenon attributed to the implicit bias at play.
Various sources of implicit bias have been identified, such as step size, weight initialization, optimization algorithm, and number of parameters.
arXiv Detail & Related papers (2024-02-04T11:54:07Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - PAC-Bayes Compression Bounds So Tight That They Can Explain
Generalization [48.26492774959634]
We develop a compression approach based on quantizing neural network parameters in a linear subspace.
We find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor.
arXiv Detail & Related papers (2022-11-24T13:50:16Z) - Improved architectures and training algorithms for deep operator
networks [0.0]
Operator learning techniques have emerged as a powerful tool for learning maps between infinite-dimensional Banach spaces.
We analyze the training dynamics of deep operator networks (DeepONets) through the lens of Neural Tangent Kernel (NTK) theory.
arXiv Detail & Related papers (2021-10-04T18:34:41Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.