Optimal Scaling for Locally Balanced Proposals in Discrete Spaces
- URL: http://arxiv.org/abs/2209.08183v1
- Date: Fri, 16 Sep 2022 22:09:53 GMT
- Title: Optimal Scaling for Locally Balanced Proposals in Discrete Spaces
- Authors: Haoran Sun, Hanjun Dai, Dale Schuurmans
- Abstract summary: We show that efficiency of Metropolis-Hastings (M-H) algorithms in discrete spaces can be characterized by an acceptance rate that is independent of the target distribution.
Knowledge of the optimal acceptance rate allows one to automatically tune the neighborhood size of a proposal distribution in a discrete space, directly analogous to step-size control in continuous spaces.
- Score: 65.14092237705476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimal scaling has been well studied for Metropolis-Hastings (M-H)
algorithms in continuous spaces, but a similar understanding has been lacking
in discrete spaces. Recently, a family of locally balanced proposals (LBP) for
discrete spaces has been proved to be asymptotically optimal, but the question
of optimal scaling has remained open. In this paper, we establish, for the
first time, that the efficiency of M-H in discrete spaces can also be
characterized by an asymptotic acceptance rate that is independent of the
target distribution. Moreover, we verify, both theoretically and empirically,
that the optimal acceptance rates for LBP and random walk Metropolis (RWM) are
$0.574$ and $0.234$ respectively. These results also help establish that LBP is
asymptotically $O(N^\frac{2}{3})$ more efficient than RWM with respect to model
dimension $N$. Knowledge of the optimal acceptance rate allows one to
automatically tune the neighborhood size of a proposal distribution in a
discrete space, directly analogous to step-size control in continuous spaces.
We demonstrate empirically that such adaptive M-H sampling can robustly improve
sampling in a variety of target distributions in discrete spaces, including
training deep energy based models.
Related papers
- Faster Sampling via Stochastic Gradient Proximal Sampler [28.422547264326468]
Proximal samplers (SPS) for sampling from non-log-concave distributions are studied.
We show that the convergence to the target distribution can be guaranteed as long as the algorithm trajectory is bounded.
We provide two implementable variants based on Langevin dynamics (SGLD) and Langevin-MALA, giving rise to SPS-SGLD and SPS-MALA.
arXiv Detail & Related papers (2024-05-27T00:53:18Z) - Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach [49.97755400231656]
We show that a novel accelerated DDPM sampler achieves accelerated performance for three broad distribution classes not considered before.
Our results show an improved dependency on the data dimension $d$ among accelerated DDPM type samplers.
arXiv Detail & Related papers (2024-02-21T16:11:47Z) - Space-Time Diffusion Bridge [0.4527270266697462]
We introduce a novel method for generating new synthetic samples independent and identically distributed from real probability distributions.
We use space-time mixing strategies that extend across temporal and spatial dimensions.
We validate the efficacy of our space-time diffusion approach with numerical experiments.
arXiv Detail & Related papers (2024-02-13T23:26:11Z) - Symmetric Mean-field Langevin Dynamics for Distributional Minimax
Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates.
We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Optimal design of the Barker proposal and other locally-balanced
Metropolis-Hastings algorithms [0.2148535041822524]
We study the class of first-order locally-balanced Metropolis--Hastings algorithms introduced in Livingstone & Zanella ( 2021 )
To choose a specific algorithm within the class the user must select a balancing function $g:mathbbR to mathbbR$ satisfying $g(t) = tg (1/t)$, and a noise distribution for the proposal.
We derive an optimal choice of noise distribution for the Barker proposal, optimal choice of balancing function under a Gaussian noise distribution, and optimal choice of first-order locally-balanced algorithm among
arXiv Detail & Related papers (2022-01-04T13:06:50Z) - LSB: Local Self-Balancing MCMC in Discrete Spaces [2.385916960125935]
This work considers using machine learning to adapt the proposal distribution to the target, in order to improve the sampling efficiency in the purely discrete domain.
We call the resulting sampler as the Locally Self-Balancing Sampler (LSB)
arXiv Detail & Related papers (2021-09-08T18:31:26Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z) - Optimal dimension dependence of the Metropolis-Adjusted Langevin
Algorithm [22.19906823739798]
Mixing time of MALA on class of log-smooth and strongly log-concave distributions is $O(d)$.
New technique based on projection characterization of the Metropolis adjustment reduces study of MALA to the well-studied discretization analysis of the Langevin SDE.
arXiv Detail & Related papers (2020-12-23T17:14:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.