Related papers: A Langevin-like Sampler for Discrete Distributions

A Langevin-like Sampler for Discrete Distributions

URL: http://arxiv.org/abs/2206.09914v1
Date: Mon, 20 Jun 2022 17:36:03 GMT
Title: A Langevin-like Sampler for Discrete Distributions
Authors: Ruqi Zhang, Xingchao Liu, Qiang Liu
Abstract summary: discrete Langevin proposal (DLP) is a simple and scalable gradient-based proposal for sampling complex high-dimensional discrete distributions. DLP is able to update all coordinates in parallel in a single step and the magnitude of changes is controlled by a stepsize. We develop several variants of sampling algorithms, including unadjusted, adjusted, and preconditioned versions.
Score: 15.260564469562542
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose discrete Langevin proposal (DLP), a simple and scalable gradient-based proposal for sampling complex high-dimensional discrete distributions. In contrast to Gibbs sampling-based methods, DLP is able to update all coordinates in parallel in a single step and the magnitude of changes is controlled by a stepsize. This allows a cheap and efficient exploration in the space of high-dimensional and strongly correlated variables. We prove the efficiency of DLP by showing that the asymptotic bias of its stationary distribution is zero for log-quadratic distributions, and is small for distributions that are close to being log-quadratic. With DLP, we develop several variants of sampling algorithms, including unadjusted, Metropolis-adjusted, stochastic and preconditioned versions. DLP outperforms many popular alternatives on a wide variety of tasks, including Ising models, restricted Boltzmann machines, deep energy-based models, binary neural networks and language generation.

Related papers

Entropy-Guided Sampling of Flat Modes in Discrete Spaces [9.099589602551573]
Existing sampling algorithms often overlook the mode volume and struggle to capture flat modes effectively.<n>We propose emphEntropic Discrete Langevin Proposal (EDLP), which incorporates local entropy into the sampling process through a continuous auxiliary variable under a joint distribution.<n>We provide non-asymptotic convergence guarantees for EDLP in locally log-concave discrete distributions.
arXiv Detail & Related papers (2025-05-05T00:12:34Z)
Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling [22.256068524699472]
In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues. We combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution. Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.
arXiv Detail & Related papers (2024-08-13T08:09:05Z)
Faster Sampling via Stochastic Gradient Proximal Sampler [28.422547264326468]
Proximal samplers (SPS) for sampling from non-log-concave distributions are studied. We show that the convergence to the target distribution can be guaranteed as long as the algorithm trajectory is bounded. We provide two implementable variants based on Langevin dynamics (SGLD) and Langevin-MALA, giving rise to SPS-SGLD and SPS-MALA.
arXiv Detail & Related papers (2024-05-27T00:53:18Z)
Gaussian Cooling and Dikin Walks: The Interior-Point Method for Logconcave Sampling [8.655526882770742]
In the 1990s Nester and Nemirovski developed the Interior-Point Method (IPM) for convex optimization based on self-concordant barriers. In 2012, Kannan and Narayanan proposed the Dikin walk for uniformly sampling polytopes. Here we generalize this approach by developing and adapting IPM machinery together with the Dikin walk for poly-time sampling algorithms.
arXiv Detail & Related papers (2023-07-24T17:15:38Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z)
DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting. By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem. We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z)
Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD [0.0]
gradient descent (SGD) is a cornerstone of machine learning. default minibatch construction involves uniformly sampling a subset of the desired size. We show how specific DPPs and a string of controlled approximations can lead to gradient estimators with a variance that decays faster with the batchsize than under uniform sampling.
arXiv Detail & Related papers (2021-12-11T15:09:19Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Oops I Took A Gradient: Scalable Sampling for Discrete Distributions [53.3142984019796]
We show that this approach outperforms generic samplers in a number of difficult settings. We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data.
arXiv Detail & Related papers (2021-02-08T20:08:50Z)
Probabilistic Circuits for Variational Inference in Discrete Graphical Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult. Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO) We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN) We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z)
Ensemble Slice Sampling: Parallel, black-box and gradient-free inference for correlated & multimodal distributions [0.0]
Slice Sampling has emerged as a powerful Markov Chain Monte Carlo algorithm that adapts to the characteristics of the target distribution with minimal hand-tuning. This paper introduces Ensemble Slice Sampling (ESS), a new class of algorithms that bypasses such difficulties by adaptively tuning the initial length scale. These affine-invariant algorithms are trivial to construct, require no hand-tuning, and can easily be implemented in parallel computing environments.
arXiv Detail & Related papers (2020-02-14T19:00:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.