Perturb-and-max-product: Sampling and learning in discrete energy-based
models
- URL: http://arxiv.org/abs/2111.02458v2
- Date: Fri, 5 Nov 2021 06:25:27 GMT
- Title: Perturb-and-max-product: Sampling and learning in discrete energy-based
models
- Authors: Miguel Lazaro-Gredilla, Antoine Dedieu, Dileep George
- Abstract summary: We present perturb-and-max-product (PMP), a parallel and scalable mechanism for sampling and learning in discrete energy-based models.
We show that (a) for Ising models, PMP is orders of magnitude faster than Gibbs and Gibbs-with-Gradients at learning and generating samples of similar or better quality; (b) PMP is able to learn and sample from RBMs; (c) in a large, entangled graphical model in which Gibbs and GWG fail to mix, PMP succeeds.
- Score: 3.056751497358646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perturb-and-MAP offers an elegant approach to approximately sample from a
energy-based model (EBM) by computing the maximum-a-posteriori (MAP)
configuration of a perturbed version of the model. Sampling in turn enables
learning. However, this line of research has been hindered by the general
intractability of the MAP computation. Very few works venture outside tractable
models, and when they do, they use linear programming approaches, which as we
will show, have several limitations. In this work we present
perturb-and-max-product (PMP), a parallel and scalable mechanism for sampling
and learning in discrete EBMs. Models can be arbitrary as long as they are
built using tractable factors. We show that (a) for Ising models, PMP is orders
of magnitude faster than Gibbs and Gibbs-with-Gradients (GWG) at learning and
generating samples of similar or better quality; (b) PMP is able to learn and
sample from RBMs; (c) in a large, entangled graphical model in which Gibbs and
GWG fail to mix, PMP succeeds.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - On the Sample Complexity of Quantum Boltzmann Machine Learning [0.0]
We give an operational definition of QBM learning in terms of the difference in expectation values between the model and target.
We prove that a solution can be obtained with gradient descent using at most a number of Gibbs states.
In particular, we give pre-training strategies based on mean-field, Gaussian Fermionic, and geometrically local Hamiltonians.
arXiv Detail & Related papers (2023-06-26T18:00:50Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Efficient Training of Energy-Based Models Using Jarzynski Equality [13.636994997309307]
Energy-based models (EBMs) are generative models inspired by statistical physics.
The computation of its gradient with respect to the model parameters requires sampling the model distribution.
Here we show how results for nonequilibrium thermodynamics based on Jarzynski equality can be used to perform this computation efficiently.
arXiv Detail & Related papers (2023-05-30T21:07:52Z) - Approximate Gibbs Sampler for Efficient Inference of Hierarchical Bayesian Models for Grouped Count Data [0.0]
This research develops an approximate Gibbs sampler (AGS) to efficiently learn the HBPRMs while maintaining the inference accuracy.
Numerical experiments using real and synthetic datasets with small and large counts demonstrate the superior performance of AGS.
arXiv Detail & Related papers (2022-11-28T21:00:55Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - Particle Dynamics for Learning EBMs [83.59335980576637]
Energy-based modeling is a promising approach to unsupervised learning, which yields many downstream applications from a single model.
The main difficulty in learning energy-based models with the "contrastive approaches" is the generation of samples from the current energy function at each iteration.
This paper proposes an alternative approach to getting these samples and avoiding crude MCMC sampling from the current model.
arXiv Detail & Related papers (2021-11-26T23:41:07Z) - Continual Learning with Fully Probabilistic Models [70.3497683558609]
We present an approach for continual learning based on fully probabilistic (or generative) models of machine learning.
We propose a pseudo-rehearsal approach using a Gaussian Mixture Model (GMM) instance for both generator and classifier functionalities.
We show that GMR achieves state-of-the-art performance on common class-incremental learning problems at very competitive time and memory complexity.
arXiv Detail & Related papers (2021-04-19T12:26:26Z) - Learning Energy-Based Model with Variational Auto-Encoder as Amortized
Sampler [35.80109055748496]
Training energy-based models (EBMs) by maximum likelihood requires Markov chain Monte Carlo sampling.
We learn a variational auto-encoder (VAE) to initialize the finite-step MCMC, such as Langevin dynamics that is derived from the energy function.
With these amortized MCMC samples, the EBM can be trained by maximum likelihood, which follows an "analysis by synthesis" scheme.
We call this joint training algorithm the variational MCMC teaching, in which the VAE chases the EBM toward data distribution.
arXiv Detail & Related papers (2020-12-29T20:46:40Z) - Tensor Networks for Probabilistic Sequence Modeling [7.846449972735859]
We use a uniform matrix product state (u-MPS) model for probabilistic modeling of sequence data.
We then introduce a novel generative algorithm giving trained u-MPS the ability to efficiently sample from a wide variety of conditional distributions.
Experiments on sequence modeling with synthetic and real text data show u-MPS outperforming a variety of baselines.
arXiv Detail & Related papers (2020-03-02T17:16:05Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.