Abstract: We propose a general and scalable approximate sampling strategy for
probabilistic models with discrete variables. Our approach uses gradients of
the likelihood function with respect to its discrete inputs to propose updates
in a Metropolis-Hastings sampler. We show empirically that this approach
outperforms generic samplers in a number of difficult settings including Ising
models, Potts models, restricted Boltzmann machines, and factorial hidden
Markov models. We also demonstrate the use of our improved sampler for training
deep energy-based models on high dimensional discrete data. This approach
outperforms variational auto-encoders and existing energy-based models.
Finally, we give bounds showing that our approach is near-optimal in the class
of samplers which propose local updates.