Related papers: Kernel Selection for Stein Variational Gradient Descent

Kernel Selection for Stein Variational Gradient Descent

URL: http://arxiv.org/abs/2107.09338v1
Date: Tue, 20 Jul 2021 08:48:42 GMT
Title: Kernel Selection for Stein Variational Gradient Descent
Authors: Qingzhong Ai, Shiyu Liu, Zenglin Xu
Abstract summary: We propose a combination of multiple kernels to approximate the optimal kernel instead of a single one. The proposed method not only gets rid of optimal kernel dependence but also maintains computational effectiveness.
Score: 19.16800190883095
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stein variational gradient descent (SVGD) and its variants have shown promising successes in approximate inference for complex distributions. However, their empirical performance depends crucially on the choice of optimal kernel. Unfortunately, RBF kernel with median heuristics is a common choice in previous approaches which has been proved sub-optimal. Inspired by the paradigm of multiple kernel learning, our solution to this issue is using a combination of multiple kernels to approximate the optimal kernel instead of a single one which may limit the performance and flexibility. To do so, we extend Kernelized Stein Discrepancy (KSD) to its multiple kernel view called Multiple Kernelized Stein Discrepancy (MKSD). Further, we leverage MKSD to construct a general algorithm based on SVGD, which be called Multiple Kernel SVGD (MK-SVGD). Besides, we automatically assign a weight to each kernel without any other parameters. The proposed method not only gets rid of optimal kernel dependence but also maintains computational effectiveness. Experiments on various tasks and models show the effectiveness of our method.

Related papers

Optimizing Kernel Discrepancies via Subset Selection [0.1259953341639576]
Kernel discrepancies are a powerful tool for analyzing worst-case errors in quasi-Monte Carlo (QMC) methods.<n>We introduce a novel subset selection algorithm applicable to general kernel discrepancies.
arXiv Detail & Related papers (2025-11-04T16:25:08Z)
Adaptive Kernel Selection for Stein Variational Gradient Descent [5.278971176776929]
Stein Variational Gradient Descent (SVGD) is a popular variational inference method.<n>We introduce Adaptive SVGD (Ad-SVGD), a method that alternates between updating the particles via SVGD and adaptively tuning kernel bandwidths.
arXiv Detail & Related papers (2025-10-02T14:33:57Z)
Joker: Joint Optimization Framework for Lightweight Kernel Machines [20.45405359815043]
We propose Joker, a joint optimization framework for diverse kernel models, including KRR, logistic regression, and support vector machines.<n> Experiments show that Joker saves up to 90% memory but achieves comparable training time and performance (or even better) than the state-of-the-art methods.
arXiv Detail & Related papers (2025-05-23T11:36:45Z)
DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization [6.303144414273044]
Large language models (LLMs) have unlocked many applications but also underscores the challenge of aligning them with diverse values and preferences. Direct Preference Optimization (DPO) is central to alignment but constrained by fixed divergences and limited feature transformations.
arXiv Detail & Related papers (2025-01-05T00:08:52Z)
Optimal Kernel Choice for Score Function-based Causal Discovery [92.65034439889872]
We propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms kernel selection methods.
arXiv Detail & Related papers (2024-07-14T09:32:20Z)
Spectral Truncation Kernels: Noncommutativity in $C^*$-algebraic Kernel Machines [12.11705128358537]
We propose a new class of positive definite kernels based on the spectral truncation. We show that the proposed kernels fill the gap between existing separable and commutative kernels. The flexibility of the proposed class of kernels allows us to go beyond previous separable and commutative kernels.
arXiv Detail & Related papers (2024-05-28T04:47:12Z)
Snacks: a fast large-scale kernel SVM solver [0.8602553195689513]
Snacks is a new large-scale solver for Kernel Support Vector Machines. Snacks relies on a Nystr"om approximation of the kernel matrix and an accelerated variant of the subgradient method.
arXiv Detail & Related papers (2023-04-17T04:19:20Z)
Structural Kernel Search via Bayesian Optimization and Symbolical Optimal Transport [5.1672267755831705]
For Gaussian processes, selecting the kernel is a crucial task, often done manually by the expert. We propose a novel, efficient search method through a general, structured kernel space.
arXiv Detail & Related papers (2022-10-21T09:30:21Z)
Variational Autoencoder Kernel Interpretation and Selection for Classification [59.30734371401315]
This work proposed kernel selection approaches for probabilistic classifiers based on features produced by the convolutional encoder of a variational autoencoder. In the proposed implementation, each latent variable was sampled from the distribution associated with a single kernel of the last encoder's convolution layer, as an individual distribution was created for each kernel. choosing relevant features on the sampled latent variables makes it possible to perform kernel selection, filtering the uninformative features and kernels.
arXiv Detail & Related papers (2022-09-10T17:22:53Z)
S-Rocket: Selective Random Convolution Kernels for Time Series Classification [36.9596657353794]
Random convolution kernel transform (Rocket) is a fast, efficient, and novel approach for time series feature extraction. selection of the most important kernels and pruning the redundant and less important ones is necessary to reduce computational complexity and accelerate inference of Rocket. Population-based approach is proposed for selecting the most important kernels.
arXiv Detail & Related papers (2022-03-07T15:02:12Z)
Taming Nonconvexity in Kernel Feature Selection---Favorable Properties of the Laplace Kernel [77.73399781313893]
A challenge is to establish the objective function of kernel-based feature selection. The gradient-based algorithms available for non-global optimization are only able to guarantee convergence to local minima.
arXiv Detail & Related papers (2021-06-17T11:05:48Z)
Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z)
Flow-based Kernel Prior with Application to Blind Super-Resolution [143.21527713002354]
Kernel estimation is generally one of the key problems for blind image super-resolution (SR) This paper proposes a normalizing flow-based kernel prior (FKP) for kernel modeling. Experiments on synthetic and real-world images demonstrate that the proposed FKP can significantly improve the kernel estimation accuracy.
arXiv Detail & Related papers (2021-03-29T22:37:06Z)
The Signature Kernel is the solution of a Goursat PDE [11.107838656561766]
We show that for continuously differentiable paths, the signature kernel solves a hyperbolic PDE and recognize the connection with a class of differential equations known in the literature as Goursat problems. This Goursat PDE only depends on the increments of the input sequences, does not require the explicit computation of signatures and can be solved efficiently using state-of-the-arthyperbolic PDE numerical solvers. We empirically demonstrate the effectiveness of our PDE kernel as a machine learning tool in various machine learning applications dealing with sequential data.
arXiv Detail & Related papers (2020-06-26T04:36:50Z)
Learning Deep Kernels for Non-Parametric Two-Sample Tests [50.92621794426821]
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power.
arXiv Detail & Related papers (2020-02-21T03:54:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.