On the influence of over-parameterization in manifold based surrogates
and deep neural operators
- URL: http://arxiv.org/abs/2203.05071v1
- Date: Wed, 9 Mar 2022 22:27:46 GMT
- Title: On the influence of over-parameterization in manifold based surrogates
and deep neural operators
- Authors: Katiana Kontolati, Somdatta Goswami, Michael D. Shields, George Em
Karniadakis
- Abstract summary: We show two approaches for constructing accurate and generalizable approximators for complex physico-chemical processes.
We first propose an extension of the m-PCE, constructing a mapping between latent spaces formed by two separate embeddings of input functions and output QoIs.
We demonstrate that performance m-PCE and DeepONet is comparable for cases of relatively output mappings.
When highly non-smooth dynamics is considered, DeepONet shows higher accuracy.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Constructing accurate and generalizable approximators for complex
physico-chemical processes exhibiting highly non-smooth dynamics is
challenging. In this work, we propose new developments and perform comparisons
for two promising approaches: manifold-based polynomial chaos expansion (m-PCE)
and the deep neural operator (DeepONet), and we examine the effect of
over-parameterization on generalization. We demonstrate the performance of
these methods in terms of generalization accuracy by solving the 2D
time-dependent Brusselator reaction-diffusion system with uncertainty sources,
modeling an autocatalytic chemical reaction between two species. We first
propose an extension of the m-PCE by constructing a mapping between latent
spaces formed by two separate embeddings of input functions and output QoIs. To
enhance the accuracy of the DeepONet, we introduce weight self-adaptivity in
the loss function. We demonstrate that the performance of m-PCE and DeepONet is
comparable for cases of relatively smooth input-output mappings. However, when
highly non-smooth dynamics is considered, DeepONet shows higher accuracy. We
also find that for m-PCE, modest over-parameterization leads to better
generalization, both within and outside of distribution, whereas aggressive
over-parameterization leads to over-fitting. In contrast, an even highly
over-parameterized DeepONet leads to better generalization for both smooth and
non-smooth dynamics. Furthermore, we compare the performance of the above
models with another operator learning model, the Fourier Neural Operator, and
show that its over-parameterization also leads to better generalization. Our
studies show that m-PCE can provide very good accuracy at very low training
cost, whereas a highly over-parameterized DeepONet can provide better accuracy
and robustness to noise but at higher training cost. In both methods, the
inference cost is negligible.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work.
Our empirical investigation includes tens of thousands of models trained with all combinations of threes.
We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z) - Deep Latent Force Models: ODE-based Process Convolutions for Bayesian
Deep Learning [0.0]
The deep latent force model (DLFM) is a deep Gaussian process with physics-informed kernels at each layer.
We present empirical evidence of the capability of the DLFM to capture the dynamics present in highly nonlinear real-world time series data.
We find that the DLFM is capable of achieving comparable performance to a range of non-physics-informed probabilistic models.
arXiv Detail & Related papers (2023-11-24T19:55:57Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Subsurface Characterization using Ensemble-based Approaches with Deep
Generative Models [2.184775414778289]
Inverse modeling is limited for ill-posed, high-dimensional applications due to computational costs and poor prediction accuracy with sparse datasets.
We combine Wasserstein Geneversarative Adrial Network with Gradient Penalty (WGAN-GP) and Ensemble Smoother with Multiple Data Assimilation (ES-MDA)
WGAN-GP is trained to generate high-dimensional K fields from a low-dimensional latent space and ES-MDA updates the latent variables by assimilating available measurements.
arXiv Detail & Related papers (2023-10-02T01:27:10Z) - Towards Convergence Rates for Parameter Estimation in Gaussian-gated
Mixture of Experts [40.24720443257405]
We provide a convergence analysis for maximum likelihood estimation (MLE) in the Gaussian-gated MoE model.
Our findings reveal that the MLE has distinct behaviors under two complement settings of location parameters of the Gaussian gating functions.
Notably, these behaviors can be characterized by the solvability of two different systems of equations.
arXiv Detail & Related papers (2023-05-12T16:02:19Z) - Optimizing Training Trajectories in Variational Autoencoders via Latent
Bayesian Optimization Approach [0.0]
Unsupervised and semi-supervised ML methods have become widely adopted across multiple areas of physics, chemistry, and materials sciences.
We propose a latent Bayesian optimization (zBO) approach for the hyper parameter trajectory optimization for the unsupervised and semi-supervised ML.
We demonstrate an application of this method for finding joint discrete and continuous rotationally invariant representations for MNIST and experimental data of a plasmonic nanoparticles material system.
arXiv Detail & Related papers (2022-06-30T23:41:47Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.