Related papers: Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference

Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference

URL: http://arxiv.org/abs/2411.01036v1
Date: Fri, 01 Nov 2024 21:11:48 GMT
Title: Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference
Authors: Jonathan Wenger, Kaiwen Wu, Philipp Hennig, Jacob R. Gardner, Geoff Pleiss, John P. Cunningham,
Abstract summary: We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU. As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
Score: 55.150117654242706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model selection in Gaussian processes scales prohibitively with the size of the training dataset, both in time and memory. While many approximations exist, all incur inevitable approximation error. Recent work accounts for this error in the form of computational uncertainty, which enables -- at the cost of quadratic complexity -- an explicit tradeoff between computation and precision. Here we extend this development to model selection, which requires significant enhancements to the existing approach, including linear-time scaling in the size of the dataset. We propose a novel training loss for hyperparameter optimization and demonstrate empirically that the resulting method can outperform SGPR, CGGP and SVGP, state-of-the-art methods for GP model selection, on medium to large-scale datasets. Our experiments show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU. As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty -- a fundamental prerequisite for optimal decision-making.

Related papers

Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project [14.53857041867143]
We propose an approximate, distributed, accelerated sketch-and-project algorithm ($texttADASAP$) for solving linear systems.<n>We use the theory of determinantal point processes to show that the posterior mean induced by sketch-and-project rapidly converges to the true posterior mean.<n>$texttADASAP$ scales to a dataset with $> 3 cdot 108$ samples, a feat which has not been accomplished in the literature.
arXiv Detail & Related papers (2025-05-19T20:46:26Z)
Optimizing ML Training with Metagradient Descent [69.89631748402377]
We introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients.
arXiv Detail & Related papers (2025-03-17T22:18:24Z)
An accuracy-runtime trade-off comparison of scalable Gaussian process approximations for spatial data [11.141688859736805]
A drawback of Gaussian processes is their computational cost having $mathcalO(N3)$ time and $mathcalO(N2)$ memory complexity. Numerous approximation techniques have been proposed to address this limitation. We analyze this trade-off between accuracy and runtime on multiple simulated and large-scale real-world datasets.
arXiv Detail & Related papers (2025-01-20T12:35:58Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Parallel and Limited Data Voice Conversion Using Stochastic Variational Deep Kernel Learning [2.5782420501870296]
This paper proposes a voice conversion method that works with limited data. It is based on variational deep kernel learning (SVDKL) It is possible to estimate non-smooth and more complex functions.
arXiv Detail & Related papers (2023-09-08T16:32:47Z)
Adaptive Sparse Gaussian Process [0.0]
We propose the first adaptive sparse Gaussian Process (GP) able to address all these issues. We first reformulate a variational sparse GP algorithm to make it adaptive through a forgetting factor. We then propose updating a single inducing point of the sparse GP model together with the remaining model parameters every time a new sample arrives.
arXiv Detail & Related papers (2023-02-20T21:34:36Z)
Fast emulation of density functional theory simulations using approximate Gaussian processes [0.6445605125467573]
A second statistical model that predicts the simulation output can be used in lieu of the full simulation during model fitting. We use the emulators to calibrate, in a Bayesian manner, the density functional theory (DFT) model parameters using observed data. The utility of these DFT models is to make predictions, based on observed data, about the properties of experimentally unobserved nuclides.
arXiv Detail & Related papers (2022-08-24T05:09:36Z)
Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient Descent [8.714458129632158]
Kolmogorov model (KM) is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables. We propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method. It is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80%$.
arXiv Detail & Related papers (2021-07-11T10:33:02Z)
Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties. Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z)
Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression. Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice. A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points. The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding. In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
Gaussian Process Boosting [13.162429430481982]
We introduce a novel way to combine boosting with Gaussian process and mixed effects models. We obtain increased prediction accuracy compared to existing approaches on simulated and real-world data sets.
arXiv Detail & Related papers (2020-04-06T13:19:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.