Scaled Vecchia approximation for fast computer-model emulation
- URL: http://arxiv.org/abs/2005.00386v4
- Date: Tue, 20 Jul 2021 15:43:56 GMT
- Title: Scaled Vecchia approximation for fast computer-model emulation
- Authors: Matthias Katzfuss, Joseph Guinness, Earl Lawrence
- Abstract summary: We adapt and extend a powerful class of GP methods from spatial statistics to enable the scalable analysis and emulation of large computer experiments.
Our methods are highly scalable, enabling estimation, joint prediction and simulation in near-linear time in the number of model runs.
- Score: 0.6445605125467573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many scientific phenomena are studied using computer experiments consisting
of multiple runs of a computer model while varying the input settings. Gaussian
processes (GPs) are a popular tool for the analysis of computer experiments,
enabling interpolation between input settings, but direct GP inference is
computationally infeasible for large datasets. We adapt and extend a powerful
class of GP methods from spatial statistics to enable the scalable analysis and
emulation of large computer experiments. Specifically, we apply Vecchia's
ordered conditional approximation in a transformed input space, with each input
scaled according to how strongly it relates to the computer-model response. The
scaling is learned from the data, by estimating parameters in the GP covariance
function using Fisher scoring. Our methods are highly scalable, enabling
estimation, joint prediction and simulation in near-linear time in the number
of model runs. In several numerical examples, our approach substantially
outperformed existing methods.
Related papers
- Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference [55.150117654242706]
We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU.
As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
arXiv Detail & Related papers (2024-11-01T21:11:48Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Fast emulation of density functional theory simulations using
approximate Gaussian processes [0.6445605125467573]
A second statistical model that predicts the simulation output can be used in lieu of the full simulation during model fitting.
We use the emulators to calibrate, in a Bayesian manner, the density functional theory (DFT) model parameters using observed data.
The utility of these DFT models is to make predictions, based on observed data, about the properties of experimentally unobserved nuclides.
arXiv Detail & Related papers (2022-08-24T05:09:36Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - MuyGPs: Scalable Gaussian Process Hyperparameter Estimation Using Local
Cross-Validation [1.2233362977312945]
We present MuyGPs, a novel efficient GP hyper parameter estimation method.
MuyGPs builds upon prior methods that take advantage of the nearest neighbors structure of the data.
We show that our method outperforms all known competitors both in terms of time-to-solution and the root mean squared error of the predictions.
arXiv Detail & Related papers (2021-04-29T18:10:21Z) - Active Learning for Deep Gaussian Process Surrogates [0.3222802562733786]
Deep Gaussian processes (DGPs) are increasingly popular as predictive models in machine learning (ML)
Here we explore DGPs as surrogates for computer simulation experiments whose response surfaces exhibit similar characteristics.
We build up the design sequentially, limiting both expensive evaluation of the simulator code and mitigating cubic costs of DGP inference.
arXiv Detail & Related papers (2020-12-15T00:09:37Z) - Sparse within Sparse Gaussian Processes using Neighbor Information [23.48831040972227]
We introduce a novel hierarchical prior, which imposes sparsity on the set of inducing variables.
We experimentally show considerable computational gains compared to standard sparse GPs.
arXiv Detail & Related papers (2020-11-10T11:07:53Z) - Generalized Matrix Factorization: efficient algorithms for fitting
generalized linear latent variable models to large data arrays [62.997667081978825]
Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.
Current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets.
We propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood.
arXiv Detail & Related papers (2020-10-06T04:28:19Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.