Kernel Selection for Stein Variational Gradient Descent
- URL: http://arxiv.org/abs/2107.09338v1
- Date: Tue, 20 Jul 2021 08:48:42 GMT
- Title: Kernel Selection for Stein Variational Gradient Descent
- Authors: Qingzhong Ai, Shiyu Liu, Zenglin Xu
- Abstract summary: We propose a combination of multiple kernels to approximate the optimal kernel instead of a single one.
The proposed method not only gets rid of optimal kernel dependence but also maintains computational effectiveness.
- Score: 19.16800190883095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stein variational gradient descent (SVGD) and its variants have shown
promising successes in approximate inference for complex distributions.
However, their empirical performance depends crucially on the choice of optimal
kernel. Unfortunately, RBF kernel with median heuristics is a common choice in
previous approaches which has been proved sub-optimal. Inspired by the paradigm
of multiple kernel learning, our solution to this issue is using a combination
of multiple kernels to approximate the optimal kernel instead of a single one
which may limit the performance and flexibility. To do so, we extend Kernelized
Stein Discrepancy (KSD) to its multiple kernel view called Multiple Kernelized
Stein Discrepancy (MKSD). Further, we leverage MKSD to construct a general
algorithm based on SVGD, which be called Multiple Kernel SVGD (MK-SVGD).
Besides, we automatically assign a weight to each kernel without any other
parameters. The proposed method not only gets rid of optimal kernel dependence
but also maintains computational effectiveness. Experiments on various tasks
and models show the effectiveness of our method.
Related papers
- Snacks: a fast large-scale kernel SVM solver [0.8602553195689513]
Snacks is a new large-scale solver for Kernel Support Vector Machines.
Snacks relies on a Nystr"om approximation of the kernel matrix and an accelerated variant of the subgradient method.
arXiv Detail & Related papers (2023-04-17T04:19:20Z) - Efficient Convex Algorithms for Universal Kernel Learning [50.877957471649395]
An ideal set of kernels should: admit a linear parameterization (for tractability); dense in the set of all kernels (for accuracy)
Previous algorithms for optimization of kernels were limited to classification and relied on computationally complex Semidefinite Programming (SDP) algorithms.
We propose a SVD-QCQPQP algorithm which dramatically reduces the computational complexity as compared with previous SDP-based approaches.
arXiv Detail & Related papers (2023-04-15T04:57:37Z) - Structural Kernel Search via Bayesian Optimization and Symbolical
Optimal Transport [5.1672267755831705]
For Gaussian processes, selecting the kernel is a crucial task, often done manually by the expert.
We propose a novel, efficient search method through a general, structured kernel space.
arXiv Detail & Related papers (2022-10-21T09:30:21Z) - Variational Autoencoder Kernel Interpretation and Selection for
Classification [59.30734371401315]
This work proposed kernel selection approaches for probabilistic classifiers based on features produced by the convolutional encoder of a variational autoencoder.
In the proposed implementation, each latent variable was sampled from the distribution associated with a single kernel of the last encoder's convolution layer, as an individual distribution was created for each kernel.
choosing relevant features on the sampled latent variables makes it possible to perform kernel selection, filtering the uninformative features and kernels.
arXiv Detail & Related papers (2022-09-10T17:22:53Z) - S-Rocket: Selective Random Convolution Kernels for Time Series
Classification [36.9596657353794]
Random convolution kernel transform (Rocket) is a fast, efficient, and novel approach for time series feature extraction.
selection of the most important kernels and pruning the redundant and less important ones is necessary to reduce computational complexity and accelerate inference of Rocket.
Population-based approach is proposed for selecting the most important kernels.
arXiv Detail & Related papers (2022-03-07T15:02:12Z) - Taming Nonconvexity in Kernel Feature Selection---Favorable Properties
of the Laplace Kernel [77.73399781313893]
A challenge is to establish the objective function of kernel-based feature selection.
The gradient-based algorithms available for non-global optimization are only able to guarantee convergence to local minima.
arXiv Detail & Related papers (2021-06-17T11:05:48Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - Flow-based Kernel Prior with Application to Blind Super-Resolution [143.21527713002354]
Kernel estimation is generally one of the key problems for blind image super-resolution (SR)
This paper proposes a normalizing flow-based kernel prior (FKP) for kernel modeling.
Experiments on synthetic and real-world images demonstrate that the proposed FKP can significantly improve the kernel estimation accuracy.
arXiv Detail & Related papers (2021-03-29T22:37:06Z) - The Signature Kernel is the solution of a Goursat PDE [11.107838656561766]
We show that for continuously differentiable paths, the signature kernel solves a hyperbolic PDE and recognize the connection with a class of differential equations known in the literature as Goursat problems.
This Goursat PDE only depends on the increments of the input sequences, does not require the explicit computation of signatures and can be solved efficiently using state-of-the-arthyperbolic PDE numerical solvers.
We empirically demonstrate the effectiveness of our PDE kernel as a machine learning tool in various machine learning applications dealing with sequential data.
arXiv Detail & Related papers (2020-06-26T04:36:50Z) - Learning Deep Kernels for Non-Parametric Two-Sample Tests [50.92621794426821]
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution.
Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power.
arXiv Detail & Related papers (2020-02-21T03:54:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.