Related papers: Log-Linear-Time Gaussian Processes Using Binary Tree Kernels

Log-Linear-Time Gaussian Processes Using Binary Tree Kernels

URL: http://arxiv.org/abs/2210.01633v1
Date: Tue, 4 Oct 2022 14:30:06 GMT
Title: Log-Linear-Time Gaussian Processes Using Binary Tree Kernels
Authors: Michael K. Cohen, Samuel Daulton, Michael A. Osborne
Abstract summary: We present a new kernel that allows for Gaussian process regression in $O((n+m)log(n+m))$ time. Our "binary tree" kernel places all data points on the leaves of a binary tree, with the kernel depending only on the depth of the deepest common ancestor. On large datasets, the binary tree GP is fastest, and much faster than a Mat'ern GP.
Score: 26.526204269075766
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Gaussian processes (GPs) produce good probabilistic models of functions, but most GP kernels require $O((n+m)n^2)$ time, where $n$ is the number of data points and $m$ the number of predictive locations. We present a new kernel that allows for Gaussian process regression in $O((n+m)\log(n+m))$ time. Our "binary tree" kernel places all data points on the leaves of a binary tree, with the kernel depending only on the depth of the deepest common ancestor. We can store the resulting kernel matrix in $O(n)$ space in $O(n \log n)$ time, as a sum of sparse rank-one matrices, and approximately invert the kernel matrix in $O(n)$ time. Sparse GP methods also offer linear run time, but they predict less well than higher dimensional kernels. On a classic suite of regression tasks, we compare our kernel against Mat\'ern, sparse, and sparse variational kernels. The binary tree GP assigns the highest likelihood to the test data on a plurality of datasets, usually achieves lower mean squared error than the sparse methods, and often ties or beats the Mat\'ern GP. On large datasets, the binary tree GP is fastest, and much faster than a Mat\'ern GP.

Related papers

Large-Scale Gaussian Processes via Alternating Projection [23.79090469387859]
We propose an iterative method which only accesses subblocks of the kernel matrix, effectively enabling mini-batching. Our algorithm, based on alternating projection, has $mathcalO(n)$ per-iteration time and space complexity, solving many of the practical challenges of scaling GPs to very large datasets.
arXiv Detail & Related papers (2023-10-26T04:20:36Z)
Faster Algorithms for Structured Linear and Kernel Support Vector Machines [9.62296035593105]
We design the first nearly-linear time algorithm for solving quadratic programs when the quadratic objective admits a low-rank factorization. We prove that when the squared dataset radius is at least $Omega(log2 n)$, then $Omega(n2-o(1))$ time is required.
arXiv Detail & Related papers (2023-07-15T07:19:29Z)
Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach [0.9453554184019107]
A kernel matrix may be defined as $K_ij = kappa(x_i,y_j)$ where $kappa(x,y)$ is a kernel function. We seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large.
arXiv Detail & Related papers (2022-12-24T07:15:00Z)
Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation [24.166833799353476]
We develop efficient reductions from $textitweighted edge sampling$ on kernel graphs, $textitsimulating random walks$ on kernel graphs, and $textitweighted sampling$ on matrices to Kernel Density Estimation. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest.
arXiv Detail & Related papers (2022-12-01T16:42:56Z)
On The Relative Error of Random Fourier Features for Preserving Kernel Distance [7.383448514243228]
We show that for a significant range of kernels, including the well-known Laplacian kernels, RFF cannot approximate the kernel distance with small relative error using low dimensions. We make the first step towards data-oblivious dimension-reduction for general shift-invariant kernels, and we obtain a similar $mathrmpoly(epsilon-1 log n)$ dimension bound for Laplacian kernels.
arXiv Detail & Related papers (2022-10-01T10:35:12Z)
Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time [54.65688986250061]
We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices. Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
arXiv Detail & Related papers (2022-02-09T15:26:03Z)
Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms. We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$. In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
Fast Sketching of Polynomial Kernels of Polynomial Degree [61.83993156683605]
kernel is especially important as other kernels can often be approximated by the kernel via a Taylor series expansion. Recent techniques in sketching reduce the dependence in the running time on the degree oblivious $q$ of the kernel. We give a new sketch which greatly improves upon this running time, by removing the dependence on $q$ in the leading order term.
arXiv Detail & Related papers (2021-08-21T02:14:55Z)
Faster Kernel Matrix Algebra via Density Estimation [46.253698241653254]
We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix $K in mathbbRn times n$ corresponding to $n$ points.
arXiv Detail & Related papers (2021-02-16T18:25:47Z)
Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.