Differentiable Annealed Importance Sampling Minimizes The Jensen-Shannon Divergence Between Initial and Target Distribution
- URL: http://arxiv.org/abs/2405.14840v1
- Date: Thu, 23 May 2024 17:55:09 GMT
- Title: Differentiable Annealed Importance Sampling Minimizes The Jensen-Shannon Divergence Between Initial and Target Distribution
- Authors: Johannes Zenn, Robert Bamler,
- Abstract summary: We show that DAIS minimizes the symmetrized KL divergence between the initial and target distribution.
DAIS can be seen as a form of variational inference (VI) in that its initial distribution is a parametric fit to an intractable target distribution.
- Score: 10.067421338825545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentiable annealed importance sampling (DAIS), proposed by Geffner & Domke (2021) and Zhang et al. (2021), allows optimizing, among others, over the initial distribution of AIS. In this paper, we show that, in the limit of many transitions, DAIS minimizes the symmetrized KL divergence (Jensen-Shannon divergence) between the initial and target distribution. Thus, DAIS can be seen as a form of variational inference (VI) in that its initial distribution is a parametric fit to an intractable target distribution. We empirically evaluate the usefulness of the initial distribution as a variational distribution on synthetic and real-world data, observing that it often provides more accurate uncertainty estimates than standard VI (optimizing the reverse KL divergence), importance weighted VI, and Markovian score climbing (optimizing the forward KL divergence).
Related papers
- Non-asymptotic Convergence of Discrete-time Diffusion Models: New Approach and Improved Rate [49.97755400231656]
We establish convergence guarantees for substantially larger classes of distributions under DT diffusion processes.
We then specialize our results to a number of interesting classes of distributions with explicit parameter dependencies.
We propose a novel accelerated sampler and show that it improves the convergence rates of the corresponding regular sampler by orders of magnitude with respect to all system parameters.
arXiv Detail & Related papers (2024-02-21T16:11:47Z) - Understanding Contrastive Learning via Distributionally Robust
Optimization [29.202594242468678]
This study reveals the inherent tolerance of contrastive learning (CL) towards sampling bias, wherein negative samples may encompass similar semantics (eg labels)
We bridge this research gap by analyzing CL through the lens of distributionally robust optimization (DRO), yielding several key insights.
We also identify CL's potential shortcomings, including over-conservatism and sensitivity to outliers, and introduce a novel Adjusted InfoNCE loss (ADNCE) to mitigate these issues.
arXiv Detail & Related papers (2023-10-17T07:32:59Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - The Representation Jensen-Shannon Divergence [0.0]
Statistical divergences quantify the difference between probability distributions.
In this work, we propose a divergence inspired by the Jensen-Shannon divergence which avoids the estimation of the probability density functions.
arXiv Detail & Related papers (2023-05-25T19:44:36Z) - Score-Based Diffusion meets Annealed Importance Sampling [89.92133671626327]
Annealed Importance Sampling remains one of the most effective methods for marginal likelihood estimation.
We leverage recent progress in score-based generative modeling to approximate the optimal extended target distribution for AIS proposals.
arXiv Detail & Related papers (2022-08-16T12:13:29Z) - Personalized Trajectory Prediction via Distribution Discrimination [78.69458579657189]
Trarimiy prediction is confronted with the dilemma to capture the multi-modal nature of future dynamics.
We present a distribution discrimination (DisDis) method to predict personalized motion patterns.
Our method can be integrated with existing multi-modal predictive models as a plug-and-play module.
arXiv Detail & Related papers (2021-07-29T17:42:12Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - KL Guided Domain Adaptation [88.19298405363452]
Domain adaptation is an important problem and often needed for real-world applications.
A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain.
We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples.
arXiv Detail & Related papers (2021-06-14T22:24:23Z) - Parametrization invariant interpretation of priors and posteriors [0.0]
We move away from the idea that "a prior distribution establishes a probability distribution over the parameters of our model" to the idea that "a prior distribution establishes a probability distribution over probability distributions"
Under this mindset, any distribution over probability distributions should be "intrinsic", that is, invariant to the specific parametrization which is selected for the manifold.
arXiv Detail & Related papers (2021-05-18T06:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.