Fast and Scalable Spike and Slab Variable Selection in High-Dimensional
Gaussian Processes
- URL: http://arxiv.org/abs/2111.04558v1
- Date: Mon, 8 Nov 2021 15:13:24 GMT
- Title: Fast and Scalable Spike and Slab Variable Selection in High-Dimensional
Gaussian Processes
- Authors: Hugh Dance and Brooks Paige
- Abstract summary: We develop a fast and scalable variational inference algorithm for the spike and slab GP that is tractable with arbitrary differentiable kernels.
In experiments our method consistently outperforms vanilla and sparse variational GPs whilst retaining similar runtimes.
- Score: 12.667478571732449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variable selection in Gaussian processes (GPs) is typically undertaken by
thresholding the inverse lengthscales of `automatic relevance determination'
kernels, but in high-dimensional datasets this approach can be unreliable. A
more probabilistically principled alternative is to use spike and slab priors
and infer a posterior probability of variable inclusion. However, existing
implementations in GPs are extremely costly to run in both high-dimensional and
large-$n$ datasets, or are intractable for most kernels. As such, we develop a
fast and scalable variational inference algorithm for the spike and slab GP
that is tractable with arbitrary differentiable kernels. We improve our
algorithm's ability to adapt to the sparsity of relevant variables by Bayesian
model averaging over hyperparameters, and achieve substantial speed ups using
zero temperature posterior restrictions, dropout pruning and nearest neighbour
minibatching. In experiments our method consistently outperforms vanilla and
sparse variational GPs whilst retaining similar runtimes (even when $n=10^6$)
and performs competitively with a spike and slab GP using MCMC but runs up to
$1000$ times faster.
Related papers
- Sparse Kernel Gaussian Processes through Iterative Charted Refinement
(ICR) [0.0]
We present a new, generative method named Iterative Charted Refinement (ICR) to model Gaussian Processes.
ICR represents long- and short-range correlations by combining views of the modeled locations at varying resolutions with a user-provided coordinate chart.
ICR outperforms existing methods in terms of computational speed by one order of magnitude on the CPU and GPU.
arXiv Detail & Related papers (2022-06-21T18:00:01Z) - Shallow and Deep Nonparametric Convolutions for Gaussian Processes [0.0]
We introduce a nonparametric process convolution formulation for GPs that alleviates weaknesses by using a functional sampling approach.
We propose a composition of these nonparametric convolutions that serves as an alternative to classic deep GP models.
arXiv Detail & Related papers (2022-06-17T19:03:04Z) - Exact Gaussian Processes for Massive Datasets via Non-Stationary
Sparsity-Discovering Kernels [0.0]
We propose to take advantage of naturally-structured sparsity by allowing the kernel to discover -- instead of induce -- sparse structure.
The principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points.
arXiv Detail & Related papers (2022-05-18T16:56:53Z) - Scaling Gaussian Process Optimization by Evaluating a Few Unique
Candidates Multiple Times [119.41129787351092]
We show that sequential black-box optimization based on GPs can be made efficient by sticking to a candidate solution for multiple evaluation steps.
We modify two well-established GP-Opt algorithms, GP-UCB and GP-EI to adapt rules from batched GP-Opt.
arXiv Detail & Related papers (2022-01-30T20:42:14Z) - Non-Gaussian Gaussian Processes for Few-Shot Regression [71.33730039795921]
We propose an invertible ODE-based mapping that operates on each component of the random variable vectors and shares the parameters across all of them.
NGGPs outperform the competing state-of-the-art approaches on a diversified set of benchmarks and applications.
arXiv Detail & Related papers (2021-10-26T10:45:25Z) - Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication.
We prove that preconditioning has an additional benefit that has been previously unexplored.
It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - MuyGPs: Scalable Gaussian Process Hyperparameter Estimation Using Local
Cross-Validation [1.2233362977312945]
We present MuyGPs, a novel efficient GP hyper parameter estimation method.
MuyGPs builds upon prior methods that take advantage of the nearest neighbors structure of the data.
We show that our method outperforms all known competitors both in terms of time-to-solution and the root mean squared error of the predictions.
arXiv Detail & Related papers (2021-04-29T18:10:21Z) - Likelihood-Free Inference with Deep Gaussian Processes [70.74203794847344]
Surrogate models have been successfully used in likelihood-free inference to decrease the number of simulator evaluations.
We propose a Deep Gaussian Process (DGP) surrogate model that can handle more irregularly behaved target distributions.
Our experiments show how DGPs can outperform GPs on objective functions with multimodal distributions and maintain a comparable performance in unimodal cases.
arXiv Detail & Related papers (2020-06-18T14:24:05Z) - Quadruply Stochastic Gaussian Processes [10.152838128195466]
We introduce a variational inference procedure for training scalable Gaussian process (GP) models whose per-iteration complexity is independent of both the number of training points, $n$, and the number basis functions used in the kernel approximation, $m$.
We demonstrate accurate inference on large classification and regression datasets using GPs and relevance vector machines with up to $m = 107$ basis functions.
arXiv Detail & Related papers (2020-06-04T17:06:25Z) - Near-linear Time Gaussian Process Optimization with Adaptive Batching
and Resparsification [119.41129787351092]
We introduce BBKB, the first no-regret GP optimization algorithm that provably runs in near-linear time and selects candidates in batches.
We show that the same bound can be used to adaptively delay costly updates to the sparse GP approximation, achieving a near-constant per-step amortized cost.
arXiv Detail & Related papers (2020-02-23T17:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.