Related papers: Kernel Stein Discrepancy thinning: a theoretical perspective of pathologies and a practical fix with regularization

Kernel Stein Discrepancy thinning: a theoretical perspective of pathologies and a practical fix with regularization

URL: http://arxiv.org/abs/2301.13528v3
Date: Thu, 26 Oct 2023 12:03:13 GMT
Title: Kernel Stein Discrepancy thinning: a theoretical perspective of pathologies and a practical fix with regularization
Authors: Cl\'ement B\'enard, Brian Staber, S\'ebastien Da Veiga (CREST)
Abstract summary: Stein thinning is a promising algorithm proposed by (Riabiz et al., 2022) for post-processing outputs of Markov chain Monte Carlo. In this article, we conduct a theoretical analysis of these pathologies, to clearly identify the mechanisms at stake, and suggest improved strategies. We then introduce the regularized Stein thinning algorithm to alleviate the identified pathologies.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stein thinning is a promising algorithm proposed by (Riabiz et al., 2022) for post-processing outputs of Markov chain Monte Carlo (MCMC). The main principle is to greedily minimize the kernelized Stein discrepancy (KSD), which only requires the gradient of the log-target distribution, and is thus well-suited for Bayesian inference. The main advantages of Stein thinning are the automatic remove of the burn-in period, the correction of the bias introduced by recent MCMC algorithms, and the asymptotic properties of convergence towards the target distribution. Nevertheless, Stein thinning suffers from several empirical pathologies, which may result in poor approximations, as observed in the literature. In this article, we conduct a theoretical analysis of these pathologies, to clearly identify the mechanisms at stake, and suggest improved strategies. Then, we introduce the regularized Stein thinning algorithm to alleviate the identified pathologies. Finally, theoretical guarantees and extensive experiments show the high efficiency of the proposed algorithm. An implementation of regularized Stein thinning as the kernax library in python and JAX is available at https://gitlab.com/drti/kernax.

Related papers

Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Low Stein Discrepancy via Message-Passing Monte Carlo [50.81061839052459]
Message-Passing Monte Carlo (MPMC) was recently introduced as a novel low-discrepancy sampling approach leveraging tools from geometric deep learning. We extend this framework to sample from general multivariate probability distributions with known probability density function. Our proposed method, Stein-Message-Passing Monte Carlo (MPMC), minimizes a kernelized Stein discrepancy, ensuring improved sample quality.
arXiv Detail & Related papers (2025-03-27T02:49:31Z)
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes. We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z)
Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers. We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles. Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z)
Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning [78.30395044401321]
We develop a novel model-based approach to reinforcement learning (MBRL) It relaxes the assumptions on the target transition model to belong to a generic family of mixture models. It can achieve up-to 50 percent reduction in wall clock time in some continuous control environments.
arXiv Detail & Related papers (2022-06-02T17:27:49Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Kernel Stein Discrepancy Descent [16.47373844775953]
Kernel Stein Discrepancy (KSD) has received much interest recently. We investigate the properties of its Wasserstein gradient flow to approximate a target probability distribution $pi$ on $mathbbRd$. This leads to a straightforwardly implementable, deterministic score-based method to sample from $pi$, named KSD Descent.
arXiv Detail & Related papers (2021-05-20T19:05:23Z)
Stein Variational Gradient Descent: many-particle and long-time asymptotics [0.0]
Stein variational gradient descent (SVGD) refers to a class of methods for Bayesian inference based on interacting particle systems. We develop the cotangent space construction for the Stein geometry, prove its basic properties, and determine the large-deviation functional governing the many-particle limit. We identify the Stein-Fisher information as its leading order contribution in the long-time and many-particle regime.
arXiv Detail & Related papers (2021-02-25T16:03:04Z)
Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization [94.18714844247766]
Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport. We present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures.
arXiv Detail & Related papers (2021-02-02T21:01:13Z)
Annealed Stein Variational Gradient Descent [4.020523898765405]
Stein variational gradient descent has gained attention in the approximate literature inference for its flexibility and accuracy. We empirically explore the ability of this method to sample from multi-modal distributions and focus on two important issues: (i) the inability of the particles to escape from local modes and (ii) the inefficacy in reproducing the density of the different regions.
arXiv Detail & Related papers (2021-01-24T22:18:30Z)
Sliced Kernelized Stein Discrepancy [17.159499204595527]
Kernelized Stein discrepancy (KSD) is extensively used in goodness-of-fit tests and model learning. We propose the sliced Stein discrepancy and its scalable and kernelized variants, which employ kernel-based test functions defined on the optimal one-dimensional projections. For model learning, we show its advantages over existing Stein discrepancy baselines by training independent component analysis models with different discrepancies.
arXiv Detail & Related papers (2020-06-30T04:58:55Z)
Optimal Thinning of MCMC Output [18.177473712344565]
We consider the problem of retrospectively selecting a subset of states, of fixed cardinality, from the sample path. A novel method is proposed, based on greedy minimisation of a kernel discrepancy, that is suitable for problems where heavy compression is required.
arXiv Detail & Related papers (2020-05-08T10:54:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.