The Error of Deep Operator Networks Is the Sum of Its Parts: Branch-Trunk and Mode Error Decompositions
- URL: http://arxiv.org/abs/2602.21910v1
- Date: Wed, 25 Feb 2026 13:38:08 GMT
- Title: The Error of Deep Operator Networks Is the Sum of Its Parts: Branch-Trunk and Mode Error Decompositions
- Authors: Alexander Heinlein, Johannes Taraz,
- Abstract summary: Operator learning has the potential to strongly impact scientific computing by learning solution operators for differential equations.<n>Despite proven universal approximation properties, deep operator networks (DeepONets) often exhibit limited accuracy and generalization in practice.<n>This work analyzes performance limitations of the classical DeepONet architecture.
- Score: 45.88028371034407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Operator learning has the potential to strongly impact scientific computing by learning solution operators for differential equations, potentially accelerating multi-query tasks such as design optimization and uncertainty quantification by orders of magnitude. Despite proven universal approximation properties, deep operator networks (DeepONets) often exhibit limited accuracy and generalization in practice, which hinders their adoption. Understanding these limitations is therefore crucial for further advancing the approach. This work analyzes performance limitations of the classical DeepONet architecture. It is shown that the approximation error is dominated by the branch network when the internal dimension is sufficiently large, and that the learned trunk basis can often be replaced by classical basis functions without a significant impact on performance. To investigate this further, a modified DeepONet is constructed in which the trunk network is replaced by the left singular vectors of the training solution matrix. This modification yields several key insights. First, a spectral bias in the branch network is observed, with coefficients of dominant, low-frequency modes learned more effectively. Second, due to singular-value scaling of the branch coefficients, the overall branch error is dominated by modes with intermediate singular values rather than the smallest ones. Third, using a shared branch network for all mode coefficients, as in the standard architecture, improves generalization of small modes compared to a stacked architecture in which coefficients are computed separately. Finally, strong and detrimental coupling between modes in parameter space is identified.
Related papers
- Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study [8.183509993010983]
We study the neural scaling laws for deep operator networks using the Chen and Chen style architecture.
We quantify the neural scaling laws by analyzing its approximation and generalization errors.
Our results offer a partial explanation of the neural scaling laws in operator learning and provide a theoretical foundation for their applications.
arXiv Detail & Related papers (2024-10-01T03:06:55Z) - Minimizing Chebyshev Prototype Risk Magically Mitigates the Perils of Overfitting [1.6574413179773757]
We develop multicomponent loss functions that reduce intra-class feature correlation and maximize inter-class feature distance.
We implement the terms of the Chebyshev Prototype Risk (CPR) bound into our Explicit CPR loss function.
Our training algorithm reduces overfitting and improves upon previous approaches in many settings.
arXiv Detail & Related papers (2024-04-10T15:16:04Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - On the Versatile Uses of Partial Distance Correlation in Deep Learning [47.11577420740119]
This paper revisits a (less widely known) from statistics, called distance correlation (and its partial variant), designed to evaluate correlation between feature spaces of different dimensions.
We describe the steps necessary to carry out its deployment for large scale models.
This opens the door to a surprising array of applications ranging from conditioning one deep model w.r.t. another, learning disentangled representations as well as optimizing diverse models that would directly be more robust to adversarial attacks.
arXiv Detail & Related papers (2022-07-20T06:36:11Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Deep Nonparametric Estimation of Operators between Infinite Dimensional
Spaces [41.55700086945413]
This paper studies the nonparametric estimation of Lipschitz operators using deep neural networks.
Under the assumption that the target operator exhibits a low dimensional structure, our error bounds decay as the training sample size increases.
Our results give rise to fast rates by exploiting low dimensional structures of data in operator estimation.
arXiv Detail & Related papers (2022-01-01T16:33:44Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Compressive Sensing and Neural Networks from a Statistical Learning
Perspective [4.561032960211816]
We present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements.
Under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.
arXiv Detail & Related papers (2020-10-29T15:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.