Kernel Identification Through Transformers
- URL: http://arxiv.org/abs/2106.08185v1
- Date: Tue, 15 Jun 2021 14:32:38 GMT
- Title: Kernel Identification Through Transformers
- Authors: Fergus Simpson, Ian Davies, Vidhi Lalchand, Alessandro Vullo, Nicolas
Durrande, Carl Rasmussen
- Abstract summary: Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
- Score: 54.3795894579111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kernel selection plays a central role in determining the performance of
Gaussian Process (GP) models, as the chosen kernel determines both the
inductive biases and prior support of functions under the GP prior. This work
addresses the challenge of constructing custom kernel functions for
high-dimensional GP regression models. Drawing inspiration from recent progress
in deep learning, we introduce a novel approach named KITT: Kernel
Identification Through Transformers. KITT exploits a transformer-based
architecture to generate kernel recommendations in under 0.1 seconds, which is
several orders of magnitude faster than conventional kernel search algorithms.
We train our model using synthetic data generated from priors over a vocabulary
of known kernels. By exploiting the nature of the self-attention mechanism,
KITT is able to process datasets with inputs of arbitrary dimension. We
demonstrate that kernels chosen by KITT yield strong performance over a diverse
collection of regression benchmarks.
Related papers
- Optimal Kernel Choice for Score Function-based Causal Discovery [92.65034439889872]
We propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data.
We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms kernel selection methods.
arXiv Detail & Related papers (2024-07-14T09:32:20Z) - Spectral Truncation Kernels: Noncommutativity in $C^*$-algebraic Kernel Machines [12.11705128358537]
We propose a new class of positive definite kernels based on the spectral truncation.
We show that it is a governing factor leading to performance enhancement.
We also propose a deep learning perspective to increase the representation capacity of spectral truncation kernels.
arXiv Detail & Related papers (2024-05-28T04:47:12Z) - Kernel-U-Net: Multivariate Time Series Forecasting using Custom Kernels [1.8816077341295625]
We introduce Kernel-U-Net, a flexible and kernel-customizable U-shape neural network architecture.
Specifically, Kernel-U-Net separates the procedure of partitioning input time series into patches from kernel manipulation.
Our method offers two primary advantages: 1) Flexibility in kernel customization to adapt to specific datasets; and 2) Enhanced computational efficiency, with the complexity of the Transformer layer reduced to linear.
arXiv Detail & Related papers (2024-01-03T00:49:51Z) - Structural Kernel Search via Bayesian Optimization and Symbolical
Optimal Transport [5.1672267755831705]
For Gaussian processes, selecting the kernel is a crucial task, often done manually by the expert.
We propose a novel, efficient search method through a general, structured kernel space.
arXiv Detail & Related papers (2022-10-21T09:30:21Z) - Learning "best" kernels from data in Gaussian process regression. With
application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques.
A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning.
A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z) - S-Rocket: Selective Random Convolution Kernels for Time Series
Classification [36.9596657353794]
Random convolution kernel transform (Rocket) is a fast, efficient, and novel approach for time series feature extraction.
selection of the most important kernels and pruning the redundant and less important ones is necessary to reduce computational complexity and accelerate inference of Rocket.
Population-based approach is proposed for selecting the most important kernels.
arXiv Detail & Related papers (2022-03-07T15:02:12Z) - Fast Sketching of Polynomial Kernels of Polynomial Degree [61.83993156683605]
kernel is especially important as other kernels can often be approximated by the kernel via a Taylor series expansion.
Recent techniques in sketching reduce the dependence in the running time on the degree oblivious $q$ of the kernel.
We give a new sketch which greatly improves upon this running time, by removing the dependence on $q$ in the leading order term.
arXiv Detail & Related papers (2021-08-21T02:14:55Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Learning Deep Kernels for Non-Parametric Two-Sample Tests [50.92621794426821]
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution.
Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power.
arXiv Detail & Related papers (2020-02-21T03:54:23Z) - PolyScientist: Automatic Loop Transformations Combined with Microkernels
for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels.
We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.