Related papers: Adaptive Kernel Selection for Stein Variational Gradient Descent

Adaptive Kernel Selection for Stein Variational Gradient Descent

URL: http://arxiv.org/abs/2510.02067v1
Date: Thu, 02 Oct 2025 14:33:57 GMT
Title: Adaptive Kernel Selection for Stein Variational Gradient Descent
Authors: Moritz Melcher, Simon Weissmann, Ashia C. Wilson, Jakob Zech,
Abstract summary: Stein Variational Gradient Descent (SVGD) is a popular variational inference method.<n>We introduce Adaptive SVGD (Ad-SVGD), a method that alternates between updating the particles via SVGD and adaptively tuning kernel bandwidths.
Score: 5.278971176776929
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: A central challenge in Bayesian inference is efficiently approximating posterior distributions. Stein Variational Gradient Descent (SVGD) is a popular variational inference method which transports a set of particles to approximate a target distribution. The SVGD dynamics are governed by a reproducing kernel Hilbert space (RKHS) and are highly sensitive to the choice of the kernel function, which directly influences both convergence and approximation quality. The commonly used median heuristic offers a simple approach for setting kernel bandwidths but lacks flexibility and often performs poorly, particularly in high-dimensional settings. In this work, we propose an alternative strategy for adaptively choosing kernel parameters over an abstract family of kernels. Recent convergence analyses based on the kernelized Stein discrepancy (KSD) suggest that optimizing the kernel parameters by maximizing the KSD can improve performance. Building on this insight, we introduce Adaptive SVGD (Ad-SVGD), a method that alternates between updating the particles via SVGD and adaptively tuning kernel bandwidths through gradient ascent on the KSD. We provide a simplified theoretical analysis that extends existing results on minimizing the KSD for fixed kernels to our adaptive setting, showing convergence properties for the maximal KSD over our kernel class. Our empirical results further support this intuition: Ad-SVGD consistently outperforms standard heuristics in a variety of tasks.

Related papers

Towards understanding Accelerated Stein Variational Gradient Flow -- Analysis of Generalized Bilinear Kernels for Gaussian target distributions [0.5528896840956629]
Stein variational gradient descent (SVGD) is a kernel-based and non-parametric particle method for sampling from a target distribution.<n>We introduce accelerated SVGD (ASVGD), based on an accelerated gradient flow in a metric space of probability densities.<n>In the setting of Bayesian neural networks, ASVGD outperforms SVGD significantly in terms of log-likelihood and total times.
arXiv Detail & Related papers (2025-09-04T08:39:47Z)
DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization [6.303144414273044]
Large language models (LLMs) have unlocked many applications but also underscores the challenge of aligning them with diverse values and preferences.<n>Direct Preference Optimization (DPO) is central to alignment but constrained by fixed divergences and limited feature transformations.
arXiv Detail & Related papers (2025-01-05T00:08:52Z)
Towards Understanding the Dynamics of Gaussian-Stein Variational Gradient Descent [16.16051064618816]
Stein Variational Gradient Descent (SVGD) is a nonparametric particle-based deterministic sampling algorithm. We study the dynamics of the Gaussian-SVGD projected to the family of Gaussian distributions via the bilinear kernel. We propose a density-based and a particle-based implementation of the Gaussian-SVGD, and show that several recent algorithms for GVI, proposed from different perspectives, emerge as special cases of our unified framework.
arXiv Detail & Related papers (2023-05-23T13:55:47Z)
Augmented Message Passing Stein Variational Gradient Descent [3.5788754401889014]
We study the isotropy property of finite particles during the convergence process. All particles tend to cluster around the particle center within a certain range. Our algorithm achieves satisfactory accuracy and overcomes the variance collapse problem in various benchmark problems.
arXiv Detail & Related papers (2023-05-18T01:13:04Z)
Kernel Selection for Stein Variational Gradient Descent [19.16800190883095]
We propose a combination of multiple kernels to approximate the optimal kernel instead of a single one. The proposed method not only gets rid of optimal kernel dependence but also maintains computational effectiveness.
arXiv Detail & Related papers (2021-07-20T08:48:42Z)
A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense. We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z)
Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z)
Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability. We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections. Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z)
Flow-based Kernel Prior with Application to Blind Super-Resolution [143.21527713002354]
Kernel estimation is generally one of the key problems for blind image super-resolution (SR) This paper proposes a normalizing flow-based kernel prior (FKP) for kernel modeling. Experiments on synthetic and real-world images demonstrate that the proposed FKP can significantly improve the kernel estimation accuracy.
arXiv Detail & Related papers (2021-03-29T22:37:06Z)
DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution [136.7261709896713]
We propose a data-driven approach that generates the appropriate convolution kernels to apply in response to the nature of the instances. The proposed method achieves promising results on both ScanetNetV2 and S3DIS. It also improves inference speed by more than 25% over the current state-of-the-art.
arXiv Detail & Related papers (2020-11-26T14:56:57Z)
Kernel Stein Generative Modeling [68.03537693810972]
Gradient Langevin Dynamics (SGLD) demonstrates impressive results with energy-based models on high-dimensional and complex data distributions. Stein Variational Gradient Descent (SVGD) is a deterministic sampling algorithm that iteratively transports a set of particles to approximate a given distribution. We propose noise conditional kernel SVGD (NCK-SVGD), that works in tandem with the recently introduced Noise Conditional Score Network estimator.
arXiv Detail & Related papers (2020-07-06T21:26:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.