Related papers: Kernel Identification Through Transformers

Kernel Identification Through Transformers

URL: http://arxiv.org/abs/2106.08185v1
Date: Tue, 15 Jun 2021 14:32:38 GMT
Title: Kernel Identification Through Transformers
Authors: Fergus Simpson, Ian Davies, Vidhi Lalchand, Alessandro Vullo, Nicolas Durrande, Carl Rasmussen
Abstract summary: Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. We introduce a novel approach named KITT: Kernel Identification Through Transformers.
Score: 54.3795894579111
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models, as the chosen kernel determines both the inductive biases and prior support of functions under the GP prior. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. Drawing inspiration from recent progress in deep learning, we introduce a novel approach named KITT: Kernel Identification Through Transformers. KITT exploits a transformer-based architecture to generate kernel recommendations in under 0.1 seconds, which is several orders of magnitude faster than conventional kernel search algorithms. We train our model using synthetic data generated from priors over a vocabulary of known kernels. By exploiting the nature of the self-attention mechanism, KITT is able to process datasets with inputs of arbitrary dimension. We demonstrate that kernels chosen by KITT yield strong performance over a diverse collection of regression benchmarks.

Related papers

Optimal Kernel Choice for Score Function-based Causal Discovery [92.65034439889872]
We propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms kernel selection methods.
arXiv Detail & Related papers (2024-07-14T09:32:20Z)
Spectral Truncation Kernels: Noncommutativity in $C^*$-algebraic Kernel Machines [12.11705128358537]
We propose a new class of positive definite kernels based on the spectral truncation. We show that it is a governing factor leading to performance enhancement. We also propose a deep learning perspective to increase the representation capacity of spectral truncation kernels.
arXiv Detail & Related papers (2024-05-28T04:47:12Z)
Kernel-U-Net: Multivariate Time Series Forecasting using Custom Kernels [1.8816077341295625]
We introduce Kernel-U-Net, a flexible and kernel-customizable U-shape neural network architecture. Specifically, Kernel-U-Net separates the procedure of partitioning input time series into patches from kernel manipulation. Our method offers two primary advantages: 1) Flexibility in kernel customization to adapt to specific datasets; and 2) Enhanced computational efficiency, with the complexity of the Transformer layer reduced to linear.
arXiv Detail & Related papers (2024-01-03T00:49:51Z)
Structural Kernel Search via Bayesian Optimization and Symbolical Optimal Transport [5.1672267755831705]
For Gaussian processes, selecting the kernel is a crucial task, often done manually by the expert. We propose a novel, efficient search method through a general, structured kernel space.
arXiv Detail & Related papers (2022-10-21T09:30:21Z)
Learning "best" kernels from data in Gaussian process regression. With application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques. A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning. A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z)
S-Rocket: Selective Random Convolution Kernels for Time Series Classification [36.9596657353794]
Random convolution kernel transform (Rocket) is a fast, efficient, and novel approach for time series feature extraction. selection of the most important kernels and pruning the redundant and less important ones is necessary to reduce computational complexity and accelerate inference of Rocket. Population-based approach is proposed for selecting the most important kernels.
arXiv Detail & Related papers (2022-03-07T15:02:12Z)
Fast Sketching of Polynomial Kernels of Polynomial Degree [61.83993156683605]
kernel is especially important as other kernels can often be approximated by the kernel via a Taylor series expansion. Recent techniques in sketching reduce the dependence in the running time on the degree oblivious $q$ of the kernel. We give a new sketch which greatly improves upon this running time, by removing the dependence on $q$ in the leading order term.
arXiv Detail & Related papers (2021-08-21T02:14:55Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
Learning Deep Kernels for Non-Parametric Two-Sample Tests [50.92621794426821]
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power.
arXiv Detail & Related papers (2020-02-21T03:54:23Z)
PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels. We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z)
Face Verification via learning the kernel matrix [9.414572104591027]
kernel function is introduced to solve the nonlinear pattern recognition problem. A promising approach is to learn the kernel from data automatically. In this paper, the nonlinear face verification via learning the kernel matrix is proposed.
arXiv Detail & Related papers (2020-01-21T03:39:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.