Related papers: The Fast Kernel Transform

The Fast Kernel Transform

URL: http://arxiv.org/abs/2106.04487v1
Date: Tue, 8 Jun 2021 16:15:47 GMT
Title: The Fast Kernel Transform
Authors: John Paul Ryan, Sebastian Ament, Carla P. Gomes, Anil Damle
Abstract summary: We propose the Fast Kernel Transform (FKT), a general algorithm to compute matrix-vector multiplications for datasets in moderate dimensions with quasilinear complexity. The FKT is easily applied to a broad class of kernels, including Gaussian, Matern, and Rational Quadratic covariance functions and physically motivated Green's functions. We illustrate the efficacy and versatility of the FKT by providing timing and accuracy benchmarks and by applying it to scale the neighborhood embedding (t-SNE) and Gaussian processes to large real-world data sets.
Score: 21.001203328543006
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Kernel methods are a highly effective and widely used collection of modern machine learning algorithms. A fundamental limitation of virtually all such methods are computations involving the kernel matrix that naively scale quadratically (e.g., constructing the kernel matrix and matrix-vector multiplication) or cubically (solving linear systems) with the size of the data set $N.$ We propose the Fast Kernel Transform (FKT), a general algorithm to compute matrix-vector multiplications (MVMs) for datasets in moderate dimensions with quasilinear complexity. Typically, analytically grounded fast multiplication methods require specialized development for specific kernels. In contrast, our scheme is based on auto-differentiation and automated symbolic computations that leverage the analytical structure of the underlying kernel. This allows the FKT to be easily applied to a broad class of kernels, including Gaussian, Matern, and Rational Quadratic covariance functions and physically motivated Green's functions, including those of the Laplace and Helmholtz equations. Furthermore, the FKT maintains a high, quantifiable, and controllable level of accuracy -- properties that many acceleration methods lack. We illustrate the efficacy and versatility of the FKT by providing timing and accuracy benchmarks and by applying it to scale the stochastic neighborhood embedding (t-SNE) and Gaussian processes to large real-world data sets.

Related papers

Preconditioned Additive Gaussian Processes with Fourier Acceleration [2.292881746604941]
We introduce a matrix-free method to achieve nearly linear complexity in the multiplication of kernel matrices and their derivatives. To address high-dimensional problems, we propose an additive kernel approach. Each sub- Kernel captures lower-order feature interactions, allowing for the efficient application of the NFFT method.
arXiv Detail & Related papers (2025-04-01T07:14:06Z)
Determinant Estimation under Memory Constraints and Neural Scaling Laws [48.68885778257016]
We derive a novel hierarchical algorithm for large-scale log-determinant calculation in memory-constrained settings. We show that the ratio of pseudo-determinants satisfies a power-law relationship, allowing us to derive corresponding scaling laws. This enables accurate estimation of NTK log-determinants from a tiny fraction of the full dataset.
arXiv Detail & Related papers (2025-03-06T13:32:13Z)
Fast Evaluation of Additive Kernels: Feature Arrangement, Fourier Methods, and Kernel Derivatives [0.5735035463793009]
We present a technique based on the non-equispaced fast Fourier transform (NFFT) with rigorous error analysis. We show that this approach is also well suited to allow the approximation of the matrix that arises when the kernel is differentiated. We illustrate the performance of the additive kernel scheme with fast matrix vector products on a number of data sets.
arXiv Detail & Related papers (2024-04-26T11:50:16Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Higher-order topological kernels via quantum computation [68.8204255655161]
Topological data analysis (TDA) has emerged as a powerful tool for extracting meaningful insights from complex data. We propose a quantum approach to defining Betti kernels, which is based on constructing Betti curves with increasing order.
arXiv Detail & Related papers (2023-07-14T14:48:52Z)
BioSequence2Vec: Efficient Embedding Generation For Biological Sequences [1.0896567381206714]
We propose a general-purpose representation learning approach that embodies kernel methods' qualities while avoiding computation, memory, and generalizability challenges. Our proposed fast and alignment-free embedding method can be used as input to any distance. We perform a variety of real-world classification tasks, such as SARS-CoV-2 lineage and gene family classification, outperforming several state-of-the-art embedding and kernel methods in predictive performance.
arXiv Detail & Related papers (2023-04-01T10:58:21Z)
Reconstructing Kernel-based Machine Learning Force Fields with Super-linear Convergence [0.18416014644193063]
We consider the broad class of Nystr"om-type methods to construct preconditioners. All considered methods aim to identify a representative subset of inducing ( Kernel) columns to approximate the dominant kernel spectrum.
arXiv Detail & Related papers (2022-12-24T13:45:50Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
Learning in High-Dimensional Feature Spaces Using ANOVA-Based Fast Matrix-Vector Multiplication [0.0]
kernel matrix is typically dense and large-scale. Depending on the dimension of the feature space even the computation of all of its entries in reasonable time becomes a challenging task. We propose the use of an ANOVA kernel, where we construct several kernels based on lower-dimensional feature spaces for which we provide fast algorithms realizing the matrix-vector products. Based on a feature grouping approach, we then show how the fast matrix-vector products can be embedded into a learning method choosing kernel ridge regression and the preconditioned conjugate gradient solver.
arXiv Detail & Related papers (2021-11-19T10:29:39Z)
Robust 1-bit Compressive Sensing with Partial Gaussian Circulant Matrices and Generative Priors [54.936314353063494]
We provide recovery guarantees for a correlation-based optimization algorithm for robust 1-bit compressive sensing. We make use of a practical iterative algorithm, and perform numerical experiments on image datasets to corroborate our results.
arXiv Detail & Related papers (2021-08-08T05:28:06Z)
Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
Sparse Gaussian Processes via Parametric Families of Compactly-supported Kernels [0.6091702876917279]
We propose a method for deriving parametric families of kernel functions with compact support. The parameters of this family of kernels can be learned from data using maximum likelihood estimation. We show that these approximations incur minimal error over the exact models when modeling data drawn directly from a target GP.
arXiv Detail & Related papers (2020-06-05T20:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.