Amortized Inference for Gaussian Process Hyperparameters of Structured
Kernels
- URL: http://arxiv.org/abs/2306.09819v1
- Date: Fri, 16 Jun 2023 13:02:57 GMT
- Title: Amortized Inference for Gaussian Process Hyperparameters of Structured
Kernels
- Authors: Matthias Bitzer, Mona Meister, Christoph Zimmer
- Abstract summary: Amortizing parameter inference over different datasets is a promising approach to dramatically speed up training time.
We propose amortizing kernel parameter inference over a complete kernel-structure-family rather than a fixed kernel structure.
We show drastically reduced inference time combined with competitive test performance for a large set of kernels and datasets.
- Score: 5.1672267755831705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning the kernel parameters for Gaussian processes is often the
computational bottleneck in applications such as online learning, Bayesian
optimization, or active learning. Amortizing parameter inference over different
datasets is a promising approach to dramatically speed up training time.
However, existing methods restrict the amortized inference procedure to a fixed
kernel structure. The amortization network must be redesigned manually and
trained again in case a different kernel is employed, which leads to a large
overhead in design time and training time. We propose amortizing kernel
parameter inference over a complete kernel-structure-family rather than a fixed
kernel structure. We do that via defining an amortization network over pairs of
datasets and kernel structures. This enables fast kernel inference for each
element in the kernel family without retraining the amortization network. As a
by-product, our amortization network is able to do fast ensembling over kernel
structures. In our experiments, we show drastically reduced inference time
combined with competitive test performance for a large set of kernels and
datasets.
Related papers
- Reconstructing Kernel-based Machine Learning Force Fields with
Super-linear Convergence [0.18416014644193063]
We consider the broad class of Nystr"om-type methods to construct preconditioners.
All considered methods aim to identify a representative subset of inducing ( Kernel) columns to approximate the dominant kernel spectrum.
arXiv Detail & Related papers (2022-12-24T13:45:50Z) - Lifelong Bandit Optimization: No Prior and No Regret [70.94238868711952]
We develop LIBO, an algorithm which adapts to the environment by learning from past experience.
We assume a kernelized structure where the kernel is unknown but shared across all tasks.
Our algorithm can be paired with any kernelized or linear bandit algorithm and guarantees optimal performance.
arXiv Detail & Related papers (2022-10-27T14:48:49Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Generative Kernel Continual learning [117.79080100313722]
We introduce generative kernel continual learning, which exploits the synergies between generative models and kernels for continual learning.
The generative model is able to produce representative samples for kernel learning, which removes the dependence on memory in kernel continual learning.
We conduct extensive experiments on three widely-used continual learning benchmarks that demonstrate the abilities and benefits of our contributions.
arXiv Detail & Related papers (2021-12-26T16:02:10Z) - Neural Networks as Kernel Learners: The Silent Alignment Effect [86.44610122423994]
Neural networks in the lazy training regime converge to kernel machines.
We show that this can indeed happen due to a phenomenon we term silent alignment.
We also demonstrate that non-whitened data can weaken the silent alignment effect.
arXiv Detail & Related papers (2021-10-29T18:22:46Z) - FlexConv: Continuous Kernel Convolutions with Differentiable Kernel
Sizes [34.90912459206022]
Recent works show CNNs benefit from different kernel sizes at different layers, but exploring all possible combinations is unfeasible in practice.
We propose FlexConv, a novel convolutional operation with which high bandwidth convolutional kernels of learnable kernel size can be learned at a fixed parameter cost.
arXiv Detail & Related papers (2021-10-15T12:35:49Z) - Kernel Continual Learning [117.79080100313722]
kernel continual learning is a simple but effective variant of continual learning to tackle catastrophic forgetting.
episodic memory unit stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression.
variational random features to learn a data-driven kernel for each task.
arXiv Detail & Related papers (2021-07-12T22:09:30Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior [26.52863547394537]
We present a novel probabilistic algorithm to learn a kernel composition by handling the sparsity in the kernel selection with Horseshoe prior.
Our model can capture characteristics of time series with significant reductions in computational time and have competitive regression performance on real-world data sets.
arXiv Detail & Related papers (2020-12-21T13:41:15Z) - End-to-end Kernel Learning via Generative Random Fourier Features [31.57596752889935]
Random Fourier features (RFFs) provide a promising way for kernel learning in a spectral case.
In this paper, we consider a one-stage process that incorporates the kernel learning and linear learner into a unifying framework.
arXiv Detail & Related papers (2020-09-10T00:27:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.