RFFNet: Large-Scale Interpretable Kernel Methods via Random Fourier Features
- URL: http://arxiv.org/abs/2211.06410v2
- Date: Fri, 12 Apr 2024 14:51:32 GMT
- Title: RFFNet: Large-Scale Interpretable Kernel Methods via Random Fourier Features
- Authors: Mateus P. Otto, Rafael Izbicki,
- Abstract summary: We introduce RFFNet, a scalable method that learns the kernel relevances' on the fly via first-order optimization.
We show that our approach has a small memory footprint and run-time, low prediction error, and effectively identifies relevant features.
We supply users with an efficient, PyTorch-based library, that adheres to the scikit-learn standard API and code for fully reproducing our results.
- Score: 3.0079490585515347
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kernel methods provide a flexible and theoretically grounded approach to nonlinear and nonparametric learning. While memory and run-time requirements hinder their applicability to large datasets, many low-rank kernel approximations, such as random Fourier features, were recently developed to scale up such kernel methods. However, these scalable approaches are based on approximations of isotropic kernels, which cannot remove the influence of irrelevant features. In this work, we design random Fourier features for a family of automatic relevance determination (ARD) kernels, and introduce RFFNet, a new large-scale kernel method that learns the kernel relevances' on the fly via first-order stochastic optimization. We present an effective initialization scheme for the method's non-convex objective function, evaluate if hard-thresholding RFFNet's learned relevances yield a sensible rule for variable selection, and perform an extensive ablation study of RFFNet's components. Numerical validation on simulated and real-world data shows that our approach has a small memory footprint and run-time, achieves low prediction error, and effectively identifies relevant features, thus leading to more interpretable solutions. We supply users with an efficient, PyTorch-based library, that adheres to the scikit-learn standard API and code for fully reproducing our results.
Related papers
- Optimal Kernel Choice for Score Function-based Causal Discovery [92.65034439889872]
We propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data.
We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms kernel selection methods.
arXiv Detail & Related papers (2024-07-14T09:32:20Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Enhancing Kernel Flexibility via Learning Asymmetric Locally-Adaptive
Kernels [35.76925787221499]
This paper introduces the concept of Locally-Adaptive-Bandwidths (LAB) as trainable parameters to enhance the Radial Basis Function (RBF) kernel.
The parameters in LAB RBF kernels are data-dependent, and its number can increase with the dataset.
This paper for the first time establishes an asymmetric kernel ridge regression framework and introduces an iterative kernel learning algorithm.
arXiv Detail & Related papers (2023-10-08T17:08:15Z) - FaDIn: Fast Discretized Inference for Hawkes Processes with General
Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support.
The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG)
Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Kernel Continual Learning [117.79080100313722]
kernel continual learning is a simple but effective variant of continual learning to tackle catastrophic forgetting.
episodic memory unit stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression.
variational random features to learn a data-driven kernel for each task.
arXiv Detail & Related papers (2021-07-12T22:09:30Z) - Towards Unbiased Random Features with Lower Variance For Stationary
Indefinite Kernels [26.57122949130266]
Our algorithm achieves lower variance and approximation error compared with the existing kernel approximation methods.
With better approximation to the originally selected kernels, improved classification accuracy and regression ability is obtained.
arXiv Detail & Related papers (2021-04-13T13:56:50Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior [26.52863547394537]
We present a novel probabilistic algorithm to learn a kernel composition by handling the sparsity in the kernel selection with Horseshoe prior.
Our model can capture characteristics of time series with significant reductions in computational time and have competitive regression performance on real-world data sets.
arXiv Detail & Related papers (2020-12-21T13:41:15Z) - Kernel k-Means, By All Means: Algorithms and Strong Consistency [21.013169939337583]
Kernel $k$ clustering is a powerful tool for unsupervised learning of non-linear data.
In this paper, we generalize results leveraging a general family of means to combat sub-optimal local solutions.
Our algorithm makes use of majorization-minimization (MM) to better solve this non-linear separation problem.
arXiv Detail & Related papers (2020-11-12T16:07:18Z) - End-to-end Kernel Learning via Generative Random Fourier Features [31.57596752889935]
Random Fourier features (RFFs) provide a promising way for kernel learning in a spectral case.
In this paper, we consider a one-stage process that incorporates the kernel learning and linear learner into a unifying framework.
arXiv Detail & Related papers (2020-09-10T00:27:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.