Kernel Stein Discrepancy Descent
- URL: http://arxiv.org/abs/2105.09994v1
- Date: Thu, 20 May 2021 19:05:23 GMT
- Title: Kernel Stein Discrepancy Descent
- Authors: Anna Korba, Pierre-Cyril Aubin-Frankowski, Szymon Majewski, Pierre
Ablin
- Abstract summary: Kernel Stein Discrepancy (KSD) has received much interest recently.
We investigate the properties of its Wasserstein gradient flow to approximate a target probability distribution $pi$ on $mathbbRd$.
This leads to a straightforwardly implementable, deterministic score-based method to sample from $pi$, named KSD Descent.
- Score: 16.47373844775953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Among dissimilarities between probability distributions, the Kernel Stein
Discrepancy (KSD) has received much interest recently. We investigate the
properties of its Wasserstein gradient flow to approximate a target probability
distribution $\pi$ on $\mathbb{R}^d$, known up to a normalization constant.
This leads to a straightforwardly implementable, deterministic score-based
method to sample from $\pi$, named KSD Descent, which uses a set of particles
to approximate $\pi$. Remarkably, owing to a tractable loss function, KSD
Descent can leverage robust parameter-free optimization schemes such as L-BFGS;
this contrasts with other popular particle-based schemes such as the Stein
Variational Gradient Descent algorithm. We study the convergence properties of
KSD Descent and demonstrate its practical relevance. However, we also highlight
failure cases by showing that the algorithm can get stuck in spurious local
minima.
Related papers
- Nyström Kernel Stein Discrepancy [4.551160285910023]
We propose a Nystr"om-based KSD acceleration -- with runtime $mathcal Oleft(mn+m3right)$ for $n$ samples and $mll n$ Nystr"om points.
We show its $sqrtn$-consistency with a classical sub-Gaussian assumption, and demonstrate its applicability for goodness-of-fit testing on a suite of benchmarks.
arXiv Detail & Related papers (2024-06-12T16:50:12Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - A Finite-Particle Convergence Rate for Stein Variational Gradient
Descent [47.6818454221125]
We provide the first finite-particle convergence rate for Stein variational descent gradient (SVGD)
Our explicit, non-asymptotic proof strategy will serve as a template for future refinements.
arXiv Detail & Related papers (2022-11-17T17:50:39Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Large-Scale Wasserstein Gradient Flows [84.73670288608025]
We introduce a scalable scheme to approximate Wasserstein gradient flows.
Our approach relies on input neural networks (ICNNs) to discretize the JKO steps.
As a result, we can sample from the measure at each step of the gradient diffusion and compute its density.
arXiv Detail & Related papers (2021-06-01T19:21:48Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - A Non-Asymptotic Analysis for Stein Variational Gradient Descent [44.30569261307296]
We provide a novel finite time analysis for the Stein Variational Gradient Descent algorithm.
We provide a descent lemma establishing that the algorithm decreases the objective at each iteration.
We also provide a convergence result of the finite particle system corresponding to the practical implementation of SVGD to its population version.
arXiv Detail & Related papers (2020-06-17T12:01:33Z) - Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data [10.965065178451104]
We study distributed gradient descent (SGD) in the master-worker architecture under Byzantine attacks.
Our algorithm can tolerate up to $frac14$ fraction Byzantine workers.
arXiv Detail & Related papers (2020-05-16T04:15:27Z) - The Wasserstein Proximal Gradient Algorithm [23.143814848127295]
Wasserstein gradient flows are continuous time dynamics that define curves of steepest descent to minimize an objective function over the space of probability measures.
We propose a Forward Backward (FB) discretization scheme that can tackle the case where the objective function is the sum of a smooth and a nonsmooth geodesically convex terms.
arXiv Detail & Related papers (2020-02-07T22:19:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.