B\'ezier Gaussian Processes for Tall and Wide Data
- URL: http://arxiv.org/abs/2209.00343v1
- Date: Thu, 1 Sep 2022 10:22:14 GMT
- Title: B\'ezier Gaussian Processes for Tall and Wide Data
- Authors: Martin J{\o}rgensen and Michael A. Osborne
- Abstract summary: We introduce a kernel that allows the number of summarising variables to grow exponentially with the number of input features.
We show that our kernel has close similarities to some of the most used kernels in Gaussian process regression.
- Score: 24.00638575411818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern approximations to Gaussian processes are suitable for "tall data",
with a cost that scales well in the number of observations, but under-performs
on ``wide data'', scaling poorly in the number of input features. That is, as
the number of input features grows, good predictive performance requires the
number of summarising variables, and their associated cost, to grow rapidly. We
introduce a kernel that allows the number of summarising variables to grow
exponentially with the number of input features, but requires only linear cost
in both number of observations and input features. This scaling is achieved
through our introduction of the B\'ezier buttress, which allows approximate
inference without computing matrix inverses or determinants. We show that our
kernel has close similarities to some of the most used kernels in Gaussian
process regression, and empirically demonstrate the kernel's ability to scale
to both tall and wide datasets.
Related papers
- Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference [55.150117654242706]
We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU.
As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
arXiv Detail & Related papers (2024-11-01T21:11:48Z) - Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Random Fourier Signature Features [9.85256783464329]
algebras give rise to one of the most powerful measures of similarity for sequences of arbitrary length called the signature kernel.
Previous algorithms to compute the signature kernel scale quadratically in terms of the length and the number of the sequences.
We develop a random Fourier feature-based acceleration of the signature kernel acting on the inherently non-Euclidean domain of sequences.
arXiv Detail & Related papers (2023-11-20T22:08:17Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Local Random Feature Approximations of the Gaussian Kernel [14.230653042112834]
We focus on the popular Gaussian kernel and on techniques to linearize kernel-based models by means of random feature approximations.
We show that such approaches yield poor results when modelling high-frequency data, and we propose a novel localization scheme that improves kernel approximations and downstream performance significantly.
arXiv Detail & Related papers (2022-04-12T09:52:36Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Gauss-Legendre Features for Gaussian Process Regression [7.37712470421917]
We present a Gauss-Legendre quadrature based approach for scaling up Gaussian process regression via a low rank approximation of the kernel matrix.
Our method is very much inspired by the well-known random Fourier features approach, which also builds low-rank approximations via numerical integration.
arXiv Detail & Related papers (2021-01-04T18:09:25Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - Scaling up Kernel Ridge Regression via Locality Sensitive Hashing [6.704115928005158]
We introduce a weighted version of random binning features and show that the corresponding kernel function generates smooth Gaussian processes.
We show that our weighted random binning features provide a spectral approximation to the corresponding kernel matrix, leading to efficient algorithms for kernel ridge regression.
arXiv Detail & Related papers (2020-03-21T21:41:16Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z) - Randomly Projected Additive Gaussian Processes for Regression [37.367935314532154]
We use additive sums of kernels for GP regression, where each kernel operates on a different random projection of its inputs.
We prove this convergence and its rate, and propose a deterministic approach that converges more quickly than purely random projections.
arXiv Detail & Related papers (2019-12-30T07:26:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.