A Langevin-like Sampler for Discrete Distributions
- URL: http://arxiv.org/abs/2206.09914v1
- Date: Mon, 20 Jun 2022 17:36:03 GMT
- Title: A Langevin-like Sampler for Discrete Distributions
- Authors: Ruqi Zhang, Xingchao Liu, Qiang Liu
- Abstract summary: discrete Langevin proposal (DLP) is a simple and scalable gradient-based proposal for sampling complex high-dimensional discrete distributions.
DLP is able to update all coordinates in parallel in a single step and the magnitude of changes is controlled by a stepsize.
We develop several variants of sampling algorithms, including unadjusted, adjusted, and preconditioned versions.
- Score: 15.260564469562542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose discrete Langevin proposal (DLP), a simple and scalable
gradient-based proposal for sampling complex high-dimensional discrete
distributions. In contrast to Gibbs sampling-based methods, DLP is able to
update all coordinates in parallel in a single step and the magnitude of
changes is controlled by a stepsize. This allows a cheap and efficient
exploration in the space of high-dimensional and strongly correlated variables.
We prove the efficiency of DLP by showing that the asymptotic bias of its
stationary distribution is zero for log-quadratic distributions, and is small
for distributions that are close to being log-quadratic. With DLP, we develop
several variants of sampling algorithms, including unadjusted,
Metropolis-adjusted, stochastic and preconditioned versions. DLP outperforms
many popular alternatives on a wide variety of tasks, including Ising models,
restricted Boltzmann machines, deep energy-based models, binary neural networks
and language generation.
Related papers
- Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling [22.256068524699472]
In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues.
We combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution.
Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.
arXiv Detail & Related papers (2024-08-13T08:09:05Z) - Faster Sampling via Stochastic Gradient Proximal Sampler [28.422547264326468]
Proximal samplers (SPS) for sampling from non-log-concave distributions are studied.
We show that the convergence to the target distribution can be guaranteed as long as the algorithm trajectory is bounded.
We provide two implementable variants based on Langevin dynamics (SGLD) and Langevin-MALA, giving rise to SPS-SGLD and SPS-MALA.
arXiv Detail & Related papers (2024-05-27T00:53:18Z) - Gaussian Cooling and Dikin Walks: The Interior-Point Method for Logconcave Sampling [8.655526882770742]
In the 1990s Nester and Nemirovski developed the Interior-Point Method (IPM) for convex optimization based on self-concordant barriers.
In 2012, Kannan and Narayanan proposed the Dikin walk for uniformly sampling polytopes.
Here we generalize this approach by developing and adapting IPM machinery together with the Dikin walk for poly-time sampling algorithms.
arXiv Detail & Related papers (2023-07-24T17:15:38Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions.
To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits.
In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - Determinantal point processes based on orthogonal polynomials for
sampling minibatches in SGD [0.0]
gradient descent (SGD) is a cornerstone of machine learning.
default minibatch construction involves uniformly sampling a subset of the desired size.
We show how specific DPPs and a string of controlled approximations can lead to gradient estimators with a variance that decays faster with the batchsize than under uniform sampling.
arXiv Detail & Related papers (2021-12-11T15:09:19Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Oops I Took A Gradient: Scalable Sampling for Discrete Distributions [53.3142984019796]
We show that this approach outperforms generic samplers in a number of difficult settings.
We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data.
arXiv Detail & Related papers (2021-02-08T20:08:50Z) - Probabilistic Circuits for Variational Inference in Discrete Graphical
Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult.
Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO)
We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN)
We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z) - Ensemble Slice Sampling: Parallel, black-box and gradient-free inference
for correlated & multimodal distributions [0.0]
Slice Sampling has emerged as a powerful Markov Chain Monte Carlo algorithm that adapts to the characteristics of the target distribution with minimal hand-tuning.
This paper introduces Ensemble Slice Sampling (ESS), a new class of algorithms that bypasses such difficulties by adaptively tuning the initial length scale.
These affine-invariant algorithms are trivial to construct, require no hand-tuning, and can easily be implemented in parallel computing environments.
arXiv Detail & Related papers (2020-02-14T19:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.