Linear Time Kernel Matrix Approximation via Hyperspherical Harmonics
- URL: http://arxiv.org/abs/2202.03655v1
- Date: Tue, 8 Feb 2022 05:19:39 GMT
- Title: Linear Time Kernel Matrix Approximation via Hyperspherical Harmonics
- Authors: John Paul Ryan and Anil Damle
- Abstract summary: We propose a new technique for constructing low-rank approximations of matrices that arise in kernel methods for machine learning.
Our approach pairs a novel automatically constructed analytic expansion of the underlying kernel function with a data-dependent compression step to further optimize the approximation.
Experimental results show our approach compares favorably to the commonly used Nystrom method with respect to both accuracy for a given rank and computational time for a given accuracy across a variety of kernels, dimensions, and datasets.
- Score: 3.24890820102255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new technique for constructing low-rank approximations of
matrices that arise in kernel methods for machine learning. Our approach pairs
a novel automatically constructed analytic expansion of the underlying kernel
function with a data-dependent compression step to further optimize the
approximation. This procedure works in linear time and is applicable to any
isotropic kernel. Moreover, our method accepts the desired error tolerance as
input, in contrast to prevalent methods which accept the rank as input.
Experimental results show our approach compares favorably to the commonly used
Nystrom method with respect to both accuracy for a given rank and computational
time for a given accuracy across a variety of kernels, dimensions, and
datasets. Notably, in many of these problem settings our approach produces
near-optimal low-rank approximations. We provide an efficient open-source
implementation of our new technique to complement our theoretical developments
and experimental results.
Related papers
- Fast and Scalable Multi-Kernel Encoder Classifier [4.178980693837599]
The proposed method facilitates fast and scalable kernel matrix embedding, and seamlessly integrates multiple kernels to enhance the learning process.
Our theoretical analysis offers a population-level characterization of this approach using random variables.
arXiv Detail & Related papers (2024-06-04T10:34:40Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Nystrom Method for Accurate and Scalable Implicit Differentiation [25.29277451838466]
We show that the Nystrom method consistently achieves comparable or even superior performance to other approaches.
The proposed method avoids numerical instability and can be efficiently computed in matrix operations without iterations.
arXiv Detail & Related papers (2023-02-20T02:37:26Z) - Local Random Feature Approximations of the Gaussian Kernel [14.230653042112834]
We focus on the popular Gaussian kernel and on techniques to linearize kernel-based models by means of random feature approximations.
We show that such approaches yield poor results when modelling high-frequency data, and we propose a novel localization scheme that improves kernel approximations and downstream performance significantly.
arXiv Detail & Related papers (2022-04-12T09:52:36Z) - Improved Convergence Rates for Sparse Approximation Methods in
Kernel-Based Learning [48.08663378234329]
Kernel-based models such as kernel ridge regression and Gaussian processes are ubiquitous in machine learning applications.
Existing sparse approximation methods can yield a significant reduction in the computational cost.
We provide novel confidence intervals for the Nystr"om method and the sparse variational Gaussian processes approximation method.
arXiv Detail & Related papers (2022-02-08T17:22:09Z) - A Stochastic Bundle Method for Interpolating Networks [18.313879914379008]
We propose a novel method for training deep neural networks that are capable of driving the empirical loss to zero.
At each iteration our method constructs a maximum linear approximation, known as the bundle of the objective learning approximation.
arXiv Detail & Related papers (2022-01-29T23:02:30Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - A Discrete Variational Derivation of Accelerated Methods in Optimization [68.8204255655161]
We introduce variational which allow us to derive different methods for optimization.
We derive two families of optimization methods in one-to-one correspondence.
The preservation of symplecticity of autonomous systems occurs here solely on the fibers.
arXiv Detail & Related papers (2021-06-04T20:21:53Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z) - Certified and fast computations with shallow covariance kernels [0.0]
We introduce and analyze a new and certified algorithm for the low-rank approximation of a parameterized family of covariance operators.
The proposed algorithm provides the basis of a fast sampling procedure for parameter dependent random fields.
arXiv Detail & Related papers (2020-01-24T20:28:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.