Size Lowerbounds for Deep Operator Networks
- URL: http://arxiv.org/abs/2308.06338v3
- Date: Fri, 23 Feb 2024 10:36:18 GMT
- Title: Size Lowerbounds for Deep Operator Networks
- Authors: Anirbit Mukherjee and Amartya Roy
- Abstract summary: We establish a data-dependent lowerbound on the size of DeepONets required for them to be able to reduce empirical error on noisy data.
We demonstrate the possibility that at a fixed model size, to leverage increase in this common output dimension and get monotonic lowering of training error, the size of the training data might necessarily need to scale at least quadratically with it.
- Score: 0.27195102129094995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Operator Networks are an increasingly popular paradigm for solving
regression in infinite dimensions and hence solve families of PDEs in one shot.
In this work, we aim to establish a first-of-its-kind data-dependent lowerbound
on the size of DeepONets required for them to be able to reduce empirical error
on noisy data. In particular, we show that for low training errors to be
obtained on $n$ data points it is necessary that the common output dimension of
the branch and the trunk net be scaling as $\Omega \left (
\sqrt[\leftroot{-1}\uproot{-1}4]{n} \right )$.
This inspires our experiments with DeepONets solving the
advection-diffusion-reaction PDE, where we demonstrate the possibility that at
a fixed model size, to leverage increase in this common output dimension and
get monotonic lowering of training error, the size of the training data might
necessarily need to scale at least quadratically with it.
Related papers
- Attention Map Guided Transformer Pruning for Edge Device [98.42178656762114]
Vision transformer (ViT) has achieved promising success in both holistic and occluded person re-identification (Re-ID) tasks.
We propose a novel attention map guided (AMG) transformer pruning method, which removes both redundant tokens and heads.
Comprehensive experiments on Occluded DukeMTMC and Market-1501 demonstrate the effectiveness of our proposals.
arXiv Detail & Related papers (2023-04-04T01:51:53Z) - Deep Neural Networks for Nonparametric Interaction Models with Diverging
Dimension [6.939768185086753]
We analyze a $kth$ order nonparametric interaction model in both growing dimension scenarios ($d$ grows with $n$ but at a slower rate) and in high dimension ($d gtrsim n$)
We show that under certain standard assumptions, debiased deep neural networks achieve a minimax optimal rate both in terms of $(n, d)$.
arXiv Detail & Related papers (2023-02-12T04:19:39Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Semi-supervised Invertible DeepONets for Bayesian Inverse Problems [8.594140167290098]
DeepONets offer a powerful, data-driven tool for solving parametric PDEs by learning operators.
In this work, we employ physics-informed DeepONets in the context of high-dimensional, Bayesian inverse problems.
arXiv Detail & Related papers (2022-09-06T18:55:06Z) - Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications.
We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - A Theoretical-Empirical Approach to Estimating Sample Complexity of DNNs [11.152761263415046]
This paper focuses on understanding how the generalization error scales with the amount of the training data for deep neural networks (DNNs)
We derive estimates of the generalization error that hold for deep networks and do not rely on unattainable capacity measures.
arXiv Detail & Related papers (2021-05-05T05:14:08Z) - Mixed-Privacy Forgetting in Deep Networks [114.3840147070712]
We show that the influence of a subset of the training samples can be removed from the weights of a network trained on large-scale image classification tasks.
Inspired by real-world applications of forgetting techniques, we introduce a novel notion of forgetting in mixed-privacy setting.
We show that our method allows forgetting without having to trade off the model accuracy.
arXiv Detail & Related papers (2020-12-24T19:34:56Z) - Parameter Efficient Deep Neural Networks with Bilinear Projections [16.628045837101237]
We address the parameter redundancy problem in deep neural networks (DNNs) by replacing conventional full projections with bilinear projections.
For a fully-connected layer with $D$ input nodes and $D$ output nodes, applying bilinear projection can reduce the model space complexity.
Experiments on four benchmark datasets show that applying the proposed bilinear projection to deep neural networks can achieve even higher accuracies.
arXiv Detail & Related papers (2020-11-03T00:17:24Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z) - Learning Interpretable Models Using Uncertainty Oracles [12.879371384378164]
A desirable property of interpretable models is small size, so that they are easily understandable by humans.
This leads to the following challenges: (a) small sizes imply diminished accuracy, and (b) bespoke levers provided by model families to restrict size might be insufficient to reach the desired size-accuracy trade-off.
arXiv Detail & Related papers (2019-06-17T05:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.