All in the Exponential Family: Bregman Duality in Thermodynamic
Variational Inference
- URL: http://arxiv.org/abs/2007.00642v1
- Date: Wed, 1 Jul 2020 17:46:49 GMT
- Title: All in the Exponential Family: Bregman Duality in Thermodynamic
Variational Inference
- Authors: Rob Brekelmans, Vaden Masrani, Frank Wood, Greg Ver Steeg, Aram
Galstyan
- Abstract summary: We propose an exponential family interpretation of the geometric mixture curve underlying the Thermodynamic Variational Objective (TVO)
We propose to choose intermediate distributions using equal spacing in the moment parameters of our exponential family, which matches grid search performance and allows the schedule to adaptively update over the course of training.
- Score: 42.05882835476882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recently proposed Thermodynamic Variational Objective (TVO) leverages
thermodynamic integration to provide a family of variational inference
objectives, which both tighten and generalize the ubiquitous Evidence Lower
Bound (ELBO). However, the tightness of TVO bounds was not previously known, an
expensive grid search was used to choose a "schedule" of intermediate
distributions, and model learning suffered with ostensibly tighter bounds. In
this work, we propose an exponential family interpretation of the geometric
mixture curve underlying the TVO and various path sampling methods, which
allows us to characterize the gap in TVO likelihood bounds as a sum of KL
divergences. We propose to choose intermediate distributions using equal
spacing in the moment parameters of our exponential family, which matches grid
search performance and allows the schedule to adaptively update over the course
of training. Finally, we derive a doubly reparameterized gradient estimator
which improves model learning and allows the TVO to benefit from more refined
bounds. To further contextualize our contributions, we provide a unified
framework for understanding thermodynamic integration and the TVO using Taylor
series remainders.
Related papers
- Kernel-Based Differentiable Learning of Non-Parametric Directed Acyclic Graphical Models [17.52142371968811]
Causal discovery amounts to learning a directed acyclic graph (DAG) that encodes a causal model.
Recent research has sought to bypass the search by reformulating causal discovery as a continuous optimization problem.
arXiv Detail & Related papers (2024-08-20T16:09:40Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Score-based Generative Modeling of Graphs via the System of Stochastic
Differential Equations [57.15855198512551]
We propose a novel score-based generative model for graphs with a continuous-time framework.
We show that our method is able to generate molecules that lie close to the training distribution yet do not violate the chemical valency rule.
arXiv Detail & Related papers (2022-02-05T08:21:04Z) - Variational Inference with Holder Bounds [68.8008396694788]
We present a careful analysis of the thermodynamic variational objective (TVO)
We reveal how the pathological geometry of thermodynamic curves negatively affects TVO.
This motivates our new VI objectives, named the Holder bounds, which flatten the thermodynamic curves and promise to achieve a one-step approximation of the exact marginal log-likelihood.
arXiv Detail & Related papers (2021-11-04T15:35:47Z) - A Dual Approach to Constrained Markov Decision Processes with Entropy
Regularization [7.483040617090451]
We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization.
Our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be decomposed into the primality gap and the constraint violation.
arXiv Detail & Related papers (2021-10-17T21:26:40Z) - Distributional Sliced Embedding Discrepancy for Incomparable
Distributions [22.615156512223766]
Gromov-Wasserstein (GW) distance is a key tool for manifold learning and cross-domain learning.
We propose a novel approach for comparing two computation distributions, that hinges on the idea of distributional slicing, embeddings, and on computing the closed-form Wasserstein distance between the sliced distributions.
arXiv Detail & Related papers (2021-06-04T15:11:30Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - Likelihood Ratio Exponential Families [43.98796887171374]
We use the geometric mixture path as an exponential family of distributions to analyze the thermodynamic variational objective (TVO)
We extend these likelihood ratio exponential families to include solutions to rate-distortion (RD) optimization, the information bottleneck (IB) method, and recent rate-distortion-classification approaches.
arXiv Detail & Related papers (2020-12-31T07:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.