Provable benefits of annealing for estimating normalizing constants:
Importance Sampling, Noise-Contrastive Estimation, and beyond
- URL: http://arxiv.org/abs/2310.03902v2
- Date: Mon, 9 Oct 2023 11:00:47 GMT
- Title: Provable benefits of annealing for estimating normalizing constants:
Importance Sampling, Noise-Contrastive Estimation, and beyond
- Authors: Omar Chehab, Aapo Hyvarinen, Andrej Risteski
- Abstract summary: We show that using the geometric path brings down the estimation error from an exponential to a function of the distance between the target and proposal.
We propose a two-step estimator to approximate the optimal path in an efficient way.
- Score: 24.86929310909572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research has developed several Monte Carlo methods for estimating the
normalization constant (partition function) based on the idea of annealing.
This means sampling successively from a path of distributions that interpolate
between a tractable "proposal" distribution and the unnormalized "target"
distribution. Prominent estimators in this family include annealed importance
sampling and annealed noise-contrastive estimation (NCE). Such methods hinge on
a number of design choices: which estimator to use, which path of distributions
to use and whether to use a path at all; so far, there is no definitive theory
on which choices are efficient. Here, we evaluate each design choice by the
asymptotic estimation error it produces. First, we show that using NCE is more
efficient than the importance sampling estimator, but in the limit of
infinitesimal path steps, the difference vanishes. Second, we find that using
the geometric path brings down the estimation error from an exponential to a
polynomial function of the parameter distance between the target and proposal
distributions. Third, we find that the arithmetic path, while rarely used, can
offer optimality properties over the universally-used geometric path. In fact,
in a particular limit, the optimal path is arithmetic. Based on this theory, we
finally propose a two-step estimator to approximate the optimal path in an
efficient way.
Related papers
- A Practical Diffusion Path for Sampling [8.174664278172367]
Diffusion models are used in generative modeling to estimate score vectors guiding a Langevin process.
Previous approaches rely on Monte Carlo estimators that are either computationally heavy to implement or sample-inefficient.
We propose a computationally attractive alternative, relying on the so-called dilation path, that yields score vectors that are available in closed-form.
arXiv Detail & Related papers (2024-06-20T07:00:56Z) - Sliced Wasserstein with Random-Path Projecting Directions [49.802024788196434]
We propose an optimization-free slicing distribution that provides a fast sampling for the Monte Carlo estimation of expectation.
We derive the random-path slicing distribution (RPSD) and two variants of sliced Wasserstein, i.e., the Random-Path Projection Sliced Wasserstein (RPSW) and the Importance Weighted Random-Path Projection Sliced Wasserstein (IWRPSW)
arXiv Detail & Related papers (2024-01-29T04:59:30Z) - On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates [5.13323375365494]
We provide theoretical guarantees for the convergence behaviour of diffusion-based generative models under strongly log-concave data.
Our class of functions used for score estimation is made of Lipschitz continuous functions avoiding any Lipschitzness assumption on the score function.
This approach yields the best known convergence rate for our sampling algorithm.
arXiv Detail & Related papers (2023-11-22T18:40:45Z) - Positive definite nonparametric regression using an evolutionary
algorithm with application to covariance function estimation [0.0]
We propose a novel nonparametric regression framework for estimating covariance functions of stationary processes.
Our method can impose positive definiteness, as well as isotropy and monotonicity, on the estimators.
Our method provides more reliable estimates for long-range dependence.
arXiv Detail & Related papers (2023-04-25T22:01:14Z) - Near-optimal estimation of smooth transport maps with kernel
sums-of-squares [81.02564078640275]
Under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds.
The object of interest for applications such as generative modeling is the underlying optimal transport map.
We propose the first tractable algorithm for which the statistical $L2$ error on the maps nearly matches the existing minimax lower-bounds for smooth map estimation.
arXiv Detail & Related papers (2021-12-03T13:45:36Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Entropic estimation of optimal transport maps [15.685006881635209]
We develop a method for estimating the optimal map between two distributions over $mathbbRd$ with rigorous finite-sample guarantees.
We show that our estimator is easy to compute using Sinkhorn's algorithm.
arXiv Detail & Related papers (2021-09-24T14:57:26Z) - q-Paths: Generalizing the Geometric Annealing Path using Power Means [51.73925445218366]
We introduce $q$-paths, a family of paths which includes the geometric and arithmetic mixtures as special cases.
We show that small deviations away from the geometric path yield empirical gains for Bayesian inference.
arXiv Detail & Related papers (2021-07-01T21:09:06Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations.
This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector.
We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z) - The estimation error of general first order methods [12.472245917779754]
We consider two families of estimation problems: high-dimensional regression and low-dimensional matrix estimation.
We derive lower bounds the error that hold in the high-dimensional optimals in which both the number of observations and the number of parameters diverge.
These lower bounds sense that there exist algorithms whose estimation error matches the lower bounds up to sparseally negligible terms.
arXiv Detail & Related papers (2020-02-28T18:13:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.