q-Paths: Generalizing the Geometric Annealing Path using Power Means
- URL: http://arxiv.org/abs/2107.00745v1
- Date: Thu, 1 Jul 2021 21:09:06 GMT
- Title: q-Paths: Generalizing the Geometric Annealing Path using Power Means
- Authors: Vaden Masrani, Rob Brekelmans, Thang Bui, Frank Nielsen, Aram
Galstyan, Greg Ver Steeg, Frank Wood
- Abstract summary: We introduce $q$-paths, a family of paths which includes the geometric and arithmetic mixtures as special cases.
We show that small deviations away from the geometric path yield empirical gains for Bayesian inference.
- Score: 51.73925445218366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many common machine learning methods involve the geometric annealing path, a
sequence of intermediate densities between two distributions of interest
constructed using the geometric average. While alternatives such as the
moment-averaging path have demonstrated performance gains in some settings,
their practical applicability remains limited by exponential family endpoint
assumptions and a lack of closed form energy function. In this work, we
introduce $q$-paths, a family of paths which is derived from a generalized
notion of the mean, includes the geometric and arithmetic mixtures as special
cases, and admits a simple closed form involving the deformed logarithm
function from nonextensive thermodynamics. Following previous analysis of the
geometric path, we interpret our $q$-paths as corresponding to a
$q$-exponential family of distributions, and provide a variational
representation of intermediate densities as minimizing a mixture of
$\alpha$-divergences to the endpoints. We show that small deviations away from
the geometric path yield empirical gains for Bayesian inference using
Sequential Monte Carlo and generative model evaluation using Annealed
Importance Sampling.
Related papers
- A Stein Gradient Descent Approach for Doubly Intractable Distributions [5.63014864822787]
We propose a novel Monte Carlo Stein variational gradient descent (MC-SVGD) approach for inference for doubly intractable distributions.
The proposed method achieves substantial computational gains over existing algorithms, while providing comparable inferential performance for the posterior distributions.
arXiv Detail & Related papers (2024-10-28T13:42:27Z) - von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions.
The resulting probability model has connections with continuous spin models in statistical physics.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation.
We propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP)
We show that LOOP achieves a sublinear $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret, where $d$ and $beta$ correspond to AGEC and log-covering number of the hypothesis class respectively
arXiv Detail & Related papers (2024-04-19T06:24:22Z) - Provable benefits of annealing for estimating normalizing constants:
Importance Sampling, Noise-Contrastive Estimation, and beyond [24.86929310909572]
We show that using the geometric path brings down the estimation error from an exponential to a function of the distance between the target and proposal.
We propose a two-step estimator to approximate the optimal path in an efficient way.
arXiv Detail & Related papers (2023-10-05T21:16:55Z) - Stability of Entropic Wasserstein Barycenters and application to random
geometric graphs [8.7314407902481]
Wasserstein barycenters (WB) are a notion of barycenters derived from the theory of Optimal Transport.
We show how WBs on discretized meshes relate to the geometry of the underlying manifold.
arXiv Detail & Related papers (2022-10-19T13:17:03Z) - Variational Representations of Annealing Paths: Bregman Information
under Monotonic Embedding [12.020235141059992]
We show that the arithmetic mean over arguments minimizes the expected Bregman divergence to a single representative point.
Our analysis highlights the interplay between quasi-arithmetic means, parametric families, and divergence functionals.
arXiv Detail & Related papers (2022-09-15T17:22:04Z) - An application of the splitting-up method for the computation of a
neural network representation for the solution for the filtering equations [68.8204255655161]
Filtering equations play a central role in many real-life applications, including numerical weather prediction, finance and engineering.
One of the classical approaches to approximate the solution of the filtering equations is to use a PDE inspired method, called the splitting-up method.
We combine this method with a neural network representation to produce an approximation of the unnormalised conditional distribution of the signal process.
arXiv Detail & Related papers (2022-01-10T11:01:36Z) - Annealed Importance Sampling with q-Paths [51.73925445218365]
Annealed importance sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods.
Existing literature has been primarily limited to the geometric mixture or moment-averaged paths associated with the exponential family and KL divergence.
We explore AIS using $q$-paths, which include the geometric path as a special case and are related to the homogeneous power mean, deformed exponential family, and $alpha$-divergence.
arXiv Detail & Related papers (2020-12-14T18:57:05Z) - Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations.
This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector.
We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.