Generalized vec trick for fast learning of pairwise kernel models
- URL: http://arxiv.org/abs/2009.01054v2
- Date: Fri, 4 Feb 2022 12:30:27 GMT
- Title: Generalized vec trick for fast learning of pairwise kernel models
- Authors: Markus Viljanen, Antti Airola, Tapio Pahikkala
- Abstract summary: We present a comprehensive review of pairwise kernels, that have been proposed for incorporating prior knowledge about the relationship between the objects.
We show how all the reviewed kernels can be expressed as sums of Kronecker products, allowing the use of generalized vec trick for speeding up their computation.
- Score: 3.867363075280544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pairwise learning corresponds to the supervised learning setting where the
goal is to make predictions for pairs of objects. Prominent applications
include predicting drug-target or protein-protein interactions, or
customer-product preferences. In this work, we present a comprehensive review
of pairwise kernels, that have been proposed for incorporating prior knowledge
about the relationship between the objects. Specifically, we consider the
standard, symmetric and anti-symmetric Kronecker product kernels,
metric-learning, Cartesian, ranking, as well as linear, polynomial and Gaussian
kernels. Recently, a O(nm + nq) time generalized vec trick algorithm, where n,
m, and q denote the number of pairs, drugs and targets, was introduced for
training kernel methods with the Kronecker product kernel. This was a
significant improvement over previous O(n^2) training methods, since in most
real-world applications m,q << n. In this work we show how all the reviewed
kernels can be expressed as sums of Kronecker products, allowing the use of
generalized vec trick for speeding up their computation. In the experiments, we
demonstrate how the introduced approach allows scaling pairwise kernels to much
larger data sets than previously feasible, and provide an extensive comparison
of the kernels on a number of biological interaction prediction tasks.
Related papers
- On the Approximation of Kernel functions [0.0]
The paper addresses approximations of the kernel itself.
For the Hilbert Gauss kernel on the unit cube, the paper establishes an upper bound of the associated eigenfunctions.
This improvement confirms low rank approximation methods such as the Nystr"om method.
arXiv Detail & Related papers (2024-03-11T13:50:07Z) - An Exact Kernel Equivalence for Finite Classification Models [1.4777718769290527]
We compare our exact representation to the well-known Neural Tangent Kernel (NTK) and discuss approximation error relative to the NTK.
We use this exact kernel to show that our theoretical contribution can provide useful insights into the predictions made by neural networks.
arXiv Detail & Related papers (2023-08-01T20:22:53Z) - FaDIn: Fast Discretized Inference for Hawkes Processes with General
Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support.
The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG)
Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z) - Efficient Approximate Kernel Based Spike Sequence Classification [56.2938724367661]
Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences.
Exact methods yield better classification performance, but they pose high computational costs.
We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
arXiv Detail & Related papers (2022-09-11T22:44:19Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Revisiting Memory Efficient Kernel Approximation: An Indefinite Learning
Perspective [0.8594140167290097]
Matrix approximations are a key element in large-scale machine learning approaches.
We extend MEKA to be applicable not only for shift-invariant kernels but also for non-stationary kernels.
We present a Lanczos-based estimation of a spectrum shift to develop a stable positive semi-definite MEKA approximation.
arXiv Detail & Related papers (2021-12-18T10:01:34Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Tractable Computation of Expected Kernels by Circuits [35.059091080947205]
We describe the conditions under which we can compute expected kernels exactly and efficiently.
We then demonstrate possible advancements for kernel embedding frameworks by exploiting tractable expected kernels.
We empirically evaluate both algorithms and show that they outperform standard baselines on a variety of datasets.
arXiv Detail & Related papers (2021-02-21T08:59:06Z) - Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior [26.52863547394537]
We present a novel probabilistic algorithm to learn a kernel composition by handling the sparsity in the kernel selection with Horseshoe prior.
Our model can capture characteristics of time series with significant reductions in computational time and have competitive regression performance on real-world data sets.
arXiv Detail & Related papers (2020-12-21T13:41:15Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.