Related papers: DKL-KAN: Scalable Deep Kernel Learning using Kolmogorov-Arnold Networks

DKL-KAN: Scalable Deep Kernel Learning using Kolmogorov-Arnold Networks

URL: http://arxiv.org/abs/2407.21176v1
Date: Tue, 30 Jul 2024 20:30:44 GMT
Title: DKL-KAN: Scalable Deep Kernel Learning using Kolmogorov-Arnold Networks
Authors: Shrenik Zinage, Sudeepta Mondal, Soumalya Sarkar,
Abstract summary: We introduce a scalable deep kernel using KAN (DKL-KAN) as an effective alternative to DKL using DKL-MLP. We analyze two variants of DKL-KAN for a fair comparison with DKL-MLP. The efficacy of DKL-KAN is evaluated in terms of computational training time and test prediction accuracy across a wide range of applications.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The need for scalable and expressive models in machine learning is paramount, particularly in applications requiring both structural depth and flexibility. Traditional deep learning methods, such as multilayer perceptrons (MLP), offer depth but lack ability to integrate structural characteristics of deep learning architectures with non-parametric flexibility of kernel methods. To address this, deep kernel learning (DKL) was introduced, where inputs to a base kernel are transformed using a deep learning architecture. These kernels can replace standard kernels, allowing both expressive power and scalability. The advent of Kolmogorov-Arnold Networks (KAN) has generated considerable attention and discussion among researchers in scientific domain. In this paper, we introduce a scalable deep kernel using KAN (DKL-KAN) as an effective alternative to DKL using MLP (DKL-MLP). Our approach involves simultaneously optimizing these kernel attributes using marginal likelihood within a Gaussian process framework. We analyze two variants of DKL-KAN for a fair comparison with DKL-MLP: one with same number of neurons and layers as DKL-MLP, and another with approximately same number of trainable parameters. To handle large datasets, we use kernel interpolation for scalable structured Gaussian processes (KISS-GP) for low-dimensional inputs and KISS-GP with product kernels for high-dimensional inputs. The efficacy of DKL-KAN is evaluated in terms of computational training time and test prediction accuracy across a wide range of applications. Additionally, the effectiveness of DKL-KAN is also examined in modeling discontinuities and accurately estimating prediction uncertainty. The results indicate that DKL-KAN outperforms DKL-MLP on datasets with a low number of observations. Conversely, DKL-MLP exhibits better scalability and higher test prediction accuracy on datasets with large number of observations.

Related papers

Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition [7.532273334759435]
Kernels are key to encoding prior beliefs and data structures in Gaussian process (GP) models.<n>Deep kernel learning enhances kernel flexibility by feeding inputs through a neural network before applying a standard parametric form.<n>We introduce a fully data-driven, scalable deep kernel representation where a neural network directly represents a low-rank kernel.
arXiv Detail & Related papers (2025-05-24T05:42:11Z)
Kernel-based retrieval models for hyperspectral image data optimized with Kernel Flows [0.0]
Kernel-based statistical methods are efficient, but their performance depends heavily on the selection of kernel parameters. We propose a new KF-type approach to optimize Kernel Principal Component Regression (K-PCR) and test it alongside KF-PLS. Both methods are benchmarked against non-linear regression techniques using two hyperspectral remote sensing datasets.
arXiv Detail & Related papers (2024-11-12T13:54:13Z)
Combining Primal and Dual Representations in Deep Restricted Kernel Machines Classifiers [17.031744210104556]
We propose a new method for DRKM classification coupling the objectives of KPCA and classification levels. The classification level can be formulated as an LSSVM or as a primal feature map, combining depth in terms of levels and layers. We show that our developed algorithm can effectively learn from small datasets, while using less memory than the convolutional neural network (CNN) with high-dimensional data.
arXiv Detail & Related papers (2023-06-12T10:39:57Z)
Local Sample-weighted Multiple Kernel Clustering with Consensus Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels. This paper proposes a novel local sample-weighted multiple kernel clustering model. Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z)
Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z)
Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels. We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
Graph-Aided Online Multi-Kernel Learning [12.805267089186533]
This paper studies data-driven selection of kernels from the dictionary that provide satisfactory function approximations. Based on the similarities among kernels, the novel framework constructs and refines a graph to assist choosing a subset of kernels. Our proposed algorithms enjoy tighter sub-linear regret bound compared with state-of-art graph-based online MKL alternatives.
arXiv Detail & Related papers (2021-02-09T07:43:29Z)
Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data. We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z)
Learning to Learn Kernels with Variational Random Features [118.09565227041844]
We introduce kernels with random Fourier features in the meta-learning framework to leverage their strong few-shot learning ability. We formulate the optimization of MetaVRF as a variational inference problem. We show that MetaVRF delivers much better, or at least competitive, performance compared to existing meta-learning alternatives.
arXiv Detail & Related papers (2020-06-11T18:05:29Z)
Longitudinal Deep Kernel Gaussian Process Regression [16.618767289437905]
We introduce Longitudinal deep kernel process regression (L-DKGPR) L-DKGPR automates the discovery of complex multilevel correlation structure from longitudinal data. We derive an efficient algorithm to train L-DKGPR using latent space inducing points and variational inference.
arXiv Detail & Related papers (2020-05-24T15:10:48Z)
Deep Latent-Variable Kernel Learning [25.356503463916816]
We present a complete deep latent-variable kernel learning (DLVKL) model wherein the latent variables perform encoding for regularized representation. Experiments imply that the DLVKL-NSDE performs similarly to the well calibrated GP on small datasets, and outperforms existing deep GPs on large datasets.
arXiv Detail & Related papers (2020-05-18T05:55:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.