Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian
Control
- URL: http://arxiv.org/abs/2205.11956v4
- Date: Fri, 1 Dec 2023 13:53:37 GMT
- Title: Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian
Control
- Authors: Oskar Allerbo and Rebecka J\"ornsten
- Abstract summary: We propose a closed-form, feather-light, bandwidth selection based on controlling the Jacobian.
We show on real and synthetic data that compared to cross-validation and marginal likelihood, our method is on pair in terms of model performance, but up to six orders of magnitude faster.
- Score: 1.5229257192293204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most machine learning methods require tuning of hyper-parameters. For kernel
ridge regression with the Gaussian kernel, the hyper-parameter is the
bandwidth. The bandwidth specifies the length scale of the kernel and has to be
carefully selected to obtain a model with good generalization. The default
methods for bandwidth selection, cross-validation and marginal likelihood
maximization, often yield good results, albeit at high computational costs.
Inspired by Jacobian regularization, we formulate an approximate expression for
how the derivatives of the functions inferred by kernel ridge regression with
the Gaussian kernel depend on the kernel bandwidth. We use this expression to
propose a closed-form, computationally feather-light, bandwidth selection
heuristic, based on controlling the Jacobian. In addition, the Jacobian
expression illuminates how the bandwidth selection is a trade-off between the
smoothness of the inferred function and the conditioning of the training data
kernel matrix. We show on real and synthetic data that compared to
cross-validation and marginal likelihood maximization, our method is on pair in
terms of model performance, but up to six orders of magnitude faster.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Optimal Kernel Choice for Score Function-based Causal Discovery [92.65034439889872]
We propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data.
We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms kernel selection methods.
arXiv Detail & Related papers (2024-07-14T09:32:20Z) - Solving Kernel Ridge Regression with Gradient Descent for a Non-Constant Kernel [1.5229257192293204]
KRR is a generalization of linear ridge regression that is non-linear in the data, but linear in the parameters.
We address the effects of changing the kernel during training, something that is investigated in this paper.
We show theoretically and empirically that using a decreasing bandwidth, we are able to achieve both zero training error in combination with good generalization, and a double descent behavior.
arXiv Detail & Related papers (2023-11-03T07:43:53Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Structural Kernel Search via Bayesian Optimization and Symbolical
Optimal Transport [5.1672267755831705]
For Gaussian processes, selecting the kernel is a crucial task, often done manually by the expert.
We propose a novel, efficient search method through a general, structured kernel space.
arXiv Detail & Related papers (2022-10-21T09:30:21Z) - Learning "best" kernels from data in Gaussian process regression. With
application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques.
A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning.
A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Sparse Spectrum Warped Input Measures for Nonstationary Kernel Learning [29.221457769884648]
We propose a general form of explicit, input-dependent, measure-valued warpings for learning nonstationary kernels.
The proposed learning algorithm warps inputs as conditional Gaussian measures that control the smoothness of a standard stationary kernel.
We demonstrate a remarkable efficiency in the number of parameters of the warping functions in learning problems with both small and large data regimes.
arXiv Detail & Related papers (2020-10-09T01:10:08Z) - Scaling up Kernel Ridge Regression via Locality Sensitive Hashing [6.704115928005158]
We introduce a weighted version of random binning features and show that the corresponding kernel function generates smooth Gaussian processes.
We show that our weighted random binning features provide a spectral approximation to the corresponding kernel matrix, leading to efficient algorithms for kernel ridge regression.
arXiv Detail & Related papers (2020-03-21T21:41:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.