Scalable Gaussian Processes with Latent Kronecker Structure
- URL: http://arxiv.org/abs/2506.06895v2
- Date: Sat, 16 Aug 2025 16:19:14 GMT
- Title: Scalable Gaussian Processes with Latent Kronecker Structure
- Authors: Jihao Andreas Lin, Sebastian Ament, Maximilian Balandat, David Eriksson, José Miguel Hernández-Lobato, Eytan Bakshy,
- Abstract summary: Matrix structures, such as the Kronecker product, can accelerate operations significantly, but their application commonly entails approximations or unrealistic assumptions.<n>We propose leveraging latent Kronecker structure, by expressing the kernel matrix of observed values as the projection of a latent Kronecker product.<n>We demonstrate that our method outperforms state-of-the-art sparse and variational GPs on real-world datasets with up to five million examples.
- Score: 40.188778777033086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Applying Gaussian processes (GPs) to very large datasets remains a challenge due to limited computational scalability. Matrix structures, such as the Kronecker product, can accelerate operations significantly, but their application commonly entails approximations or unrealistic assumptions. In particular, the most common path to creating a Kronecker-structured kernel matrix is by evaluating a product kernel on gridded inputs that can be expressed as a Cartesian product. However, this structure is lost if any observation is missing, breaking the Cartesian product structure, which frequently occurs in real-world data such as time series. To address this limitation, we propose leveraging latent Kronecker structure, by expressing the kernel matrix of observed values as the projection of a latent Kronecker product. In combination with iterative linear system solvers and pathwise conditioning, our method facilitates inference of exact GPs while requiring substantially fewer computational resources than standard iterative methods. We demonstrate that our method outperforms state-of-the-art sparse and variational GPs on real-world datasets with up to five million examples, including robotics, automated machine learning, and climate applications.
Related papers
- Scalable Gaussian process modeling of parametrized spatio-temporal fields [2.005299372367689]
We develop a scalable framework for learning of parametized equations over fixed or parameter-temporal domains.<n>A key feature of our approach is the efficient computation of the posterior variance at essentially the same computational cost as the posterior mean.<n>Results establish the proposed framework as an effective tool for data-driven surrogate modeling, particularly when uncertainty estimates are required for downstream tasks.
arXiv Detail & Related papers (2026-02-27T20:16:21Z) - Matrix-free Neural Preconditioner for the Dirac Operator in Lattice Gauge Theory [13.32375374102012]
We propose a framework, leveraging operator learning techniques, to construct linear maps as effective preconditioners.<n>In the context of the Schwinger model U(1) gauge theory in 1+1 spacetime dimensions, this preconditioning scheme effectively decreases the condition number of the linear systems.
arXiv Detail & Related papers (2025-09-12T16:10:18Z) - Scaling Probabilistic Circuits via Monarch Matrices [109.65822339230853]
Probabilistic Circuits (PCs) are tractable representations of probability distributions.<n>We propose a novel sparse and structured parameterization for the sum blocks in PCs.
arXiv Detail & Related papers (2025-06-14T07:39:15Z) - Preconditioned Additive Gaussian Processes with Fourier Acceleration [2.292881746604941]
We introduce a matrix-free method to achieve nearly linear complexity in the multiplication of kernel matrices and their derivatives.<n>To address high-dimensional problems, we propose an additive kernel approach.<n>Each sub- Kernel captures lower-order feature interactions, allowing for the efficient application of the NFFT method.
arXiv Detail & Related papers (2025-04-01T07:14:06Z) - Higher-order topological kernels via quantum computation [68.8204255655161]
Topological data analysis (TDA) has emerged as a powerful tool for extracting meaningful insights from complex data.
We propose a quantum approach to defining Betti kernels, which is based on constructing Betti curves with increasing order.
arXiv Detail & Related papers (2023-07-14T14:48:52Z) - Reconstructing Kernel-based Machine Learning Force Fields with
Super-linear Convergence [0.18416014644193063]
We consider the broad class of Nystr"om-type methods to construct preconditioners.
All considered methods aim to identify a representative subset of inducing ( Kernel) columns to approximate the dominant kernel spectrum.
arXiv Detail & Related papers (2022-12-24T13:45:50Z) - Linear Self-Attention Approximation via Trainable Feedforward Kernel [77.34726150561087]
In pursuit of faster computation, Efficient Transformers demonstrate an impressive variety of approaches.
We aim to expand the idea of trainable kernel methods to approximate the self-attention mechanism of the Transformer architecture.
arXiv Detail & Related papers (2022-11-08T08:14:11Z) - Automatic and effective discovery of quantum kernels [41.61572387137452]
Quantum computing can empower machine learning models by enabling kernel machines to leverage quantum kernels for representing similarity measures between data.<n>We present an approach to this problem, which employs optimization techniques, similar to those used in neural architecture search and AutoML.<n>The results obtained by testing our approach on a high-energy physics problem demonstrate that, in the best-case scenario, we can either match or improve testing accuracy with respect to the manual design approach.
arXiv Detail & Related papers (2022-09-22T16:42:14Z) - Exact Gaussian Processes for Massive Datasets via Non-Stationary
Sparsity-Discovering Kernels [0.0]
We propose to take advantage of naturally-structured sparsity by allowing the kernel to discover -- instead of induce -- sparse structure.
The principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points.
arXiv Detail & Related papers (2022-05-18T16:56:53Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Graph Kernel Neural Networks [53.91024360329517]
We propose to use graph kernels, i.e. kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain.
This allows us to define an entirely structural model that does not require computing the embedding of the input graph.
Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability.
arXiv Detail & Related papers (2021-12-14T14:48:08Z) - The Fast Kernel Transform [21.001203328543006]
We propose the Fast Kernel Transform (FKT), a general algorithm to compute matrix-vector multiplications for datasets in moderate dimensions with quasilinear complexity.
The FKT is easily applied to a broad class of kernels, including Gaussian, Matern, and Rational Quadratic covariance functions and physically motivated Green's functions.
We illustrate the efficacy and versatility of the FKT by providing timing and accuracy benchmarks and by applying it to scale the neighborhood embedding (t-SNE) and Gaussian processes to large real-world data sets.
arXiv Detail & Related papers (2021-06-08T16:15:47Z) - Sparse Gaussian Processes via Parametric Families of Compactly-supported
Kernels [0.6091702876917279]
We propose a method for deriving parametric families of kernel functions with compact support.
The parameters of this family of kernels can be learned from data using maximum likelihood estimation.
We show that these approximations incur minimal error over the exact models when modeling data drawn directly from a target GP.
arXiv Detail & Related papers (2020-06-05T20:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.