Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior
- URL: http://arxiv.org/abs/2012.11339v2
- Date: Wed, 24 Feb 2021 07:11:56 GMT
- Title: Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior
- Authors: Anh Tong, Toan Tran, Hung Bui, Jaesik Choi
- Abstract summary: We present a novel probabilistic algorithm to learn a kernel composition by handling the sparsity in the kernel selection with Horseshoe prior.
Our model can capture characteristics of time series with significant reductions in computational time and have competitive regression performance on real-world data sets.
- Score: 26.52863547394537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Choosing a proper set of kernel functions is an important problem in learning
Gaussian Process (GP) models since each kernel structure has different model
complexity and data fitness. Recently, automatic kernel composition methods
provide not only accurate prediction but also attractive interpretability
through search-based methods. However, existing methods suffer from slow kernel
composition learning. To tackle large-scaled data, we propose a new sparse
approximate posterior for GPs, MultiSVGP, constructed from groups of inducing
points associated with individual additive kernels in compositional kernels. We
demonstrate that this approximation provides a better fit to learn
compositional kernels given empirical observations. We also provide
theoretically justification on error bound when compared to the traditional
sparse GP. In contrast to the search-based approach, we present a novel
probabilistic algorithm to learn a kernel composition by handling the sparsity
in the kernel selection with Horseshoe prior. We demonstrate that our model can
capture characteristics of time series with significant reductions in
computational time and have competitive regression performance on real-world
data sets.
Related papers
- Optimal Kernel Choice for Score Function-based Causal Discovery [92.65034439889872]
We propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data.
We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms kernel selection methods.
arXiv Detail & Related papers (2024-07-14T09:32:20Z) - Fast and Scalable Multi-Kernel Encoder Classifier [4.178980693837599]
The proposed method facilitates fast and scalable kernel matrix embedding, and seamlessly integrates multiple kernels to enhance the learning process.
Our theoretical analysis offers a population-level characterization of this approach using random variables.
arXiv Detail & Related papers (2024-06-04T10:34:40Z) - An Exact Kernel Equivalence for Finite Classification Models [1.4777718769290527]
We compare our exact representation to the well-known Neural Tangent Kernel (NTK) and discuss approximation error relative to the NTK.
We use this exact kernel to show that our theoretical contribution can provide useful insights into the predictions made by neural networks.
arXiv Detail & Related papers (2023-08-01T20:22:53Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Kernel Continual Learning [117.79080100313722]
kernel continual learning is a simple but effective variant of continual learning to tackle catastrophic forgetting.
episodic memory unit stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression.
variational random features to learn a data-driven kernel for each task.
arXiv Detail & Related papers (2021-07-12T22:09:30Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - MetaKernel: Learning Variational Random Features with Limited Labels [120.90737681252594]
Few-shot learning deals with the fundamental and challenging problem of learning from a few annotated samples, while being able to generalize well on new tasks.
We propose meta-learning kernels with random Fourier features for few-shot learning, we call Meta Kernel.
arXiv Detail & Related papers (2021-05-08T21:24:09Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Graph-Aided Online Multi-Kernel Learning [12.805267089186533]
This paper studies data-driven selection of kernels from the dictionary that provide satisfactory function approximations.
Based on the similarities among kernels, the novel framework constructs and refines a graph to assist choosing a subset of kernels.
Our proposed algorithms enjoy tighter sub-linear regret bound compared with state-of-art graph-based online MKL alternatives.
arXiv Detail & Related papers (2021-02-09T07:43:29Z) - Generalized vec trick for fast learning of pairwise kernel models [3.867363075280544]
We present a comprehensive review of pairwise kernels, that have been proposed for incorporating prior knowledge about the relationship between the objects.
We show how all the reviewed kernels can be expressed as sums of Kronecker products, allowing the use of generalized vec trick for speeding up their computation.
arXiv Detail & Related papers (2020-09-02T13:27:51Z) - Sparse Gaussian Processes via Parametric Families of Compactly-supported
Kernels [0.6091702876917279]
We propose a method for deriving parametric families of kernel functions with compact support.
The parameters of this family of kernels can be learned from data using maximum likelihood estimation.
We show that these approximations incur minimal error over the exact models when modeling data drawn directly from a target GP.
arXiv Detail & Related papers (2020-06-05T20:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.