Fast, accurate training and sampling of Restricted Boltzmann Machines
- URL: http://arxiv.org/abs/2405.15376v1
- Date: Fri, 24 May 2024 09:23:43 GMT
- Title: Fast, accurate training and sampling of Restricted Boltzmann Machines
- Authors: Nicolas Béreux, Aurélien Decelle, Cyril Furtlehner, Lorenzo Rosset, Beatriz Seoane,
- Abstract summary: We present an innovative method in which the principal directions of the dataset are integrated into a low-rank RBM.
This approach enables efficient sampling of the equilibrium measure via a static Monte Carlo process.
Our results show that this strategy successfully trains RBMs to capture the full diversity of data in datasets where previous methods fail.
- Score: 4.785158987724452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thanks to their simple architecture, Restricted Boltzmann Machines (RBMs) are powerful tools for modeling complex systems and extracting interpretable insights from data. However, training RBMs, as other energy-based models, on highly structured data poses a major challenge, as effective training relies on mixing the Markov chain Monte Carlo simulations used to estimate the gradient. This process is often hindered by multiple second-order phase transitions and the associated critical slowdown. In this paper, we present an innovative method in which the principal directions of the dataset are integrated into a low-rank RBM through a convex optimization procedure. This approach enables efficient sampling of the equilibrium measure via a static Monte Carlo process. By starting the standard training process with a model that already accurately represents the main modes of the data, we bypass the initial phase transitions. Our results show that this strategy successfully trains RBMs to capture the full diversity of data in datasets where previous methods fail. Furthermore, we use the training trajectories to propose a new sampling method, {\em parallel trajectory tempering}, which allows us to sample the equilibrium measure of the trained model much faster than previous optimized MCMC approaches and a better estimation of the log-likelihood. We illustrate the success of the training method on several highly structured datasets.
Related papers
- Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions [31.75902683077129]
The Restricted Boltzmann Machine (RBM) is one of the simplest generative neural networks capable of learning input distributions.<n>We simplify the standard RBM training objective into a form that is equivalent to the multi-index model with non-separable regularization.<n>We show in particular that RBM reaches the optimal computational weak recovery threshold, aligning with the BBP transition.
arXiv Detail & Related papers (2025-05-23T15:51:46Z) - Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory [53.37473225728298]
The rapid evolution of deep learning and large language models has led to an exponential growth in the demand for training data.
Matching Training Trajectories (MTT) has been a prominent approach, which replicates the training trajectory of an expert network on real data with a synthetic dataset.
We introduce a novel method called Matching Convexified Trajectory (MCT), which aims to provide better guidance for the student trajectory.
arXiv Detail & Related papers (2024-06-28T11:06:46Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Learning Energy-Based Prior Model with Diffusion-Amortized MCMC [89.95629196907082]
Common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress.
We introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it.
arXiv Detail & Related papers (2023-10-05T00:23:34Z) - Stabilizing Machine Learning Prediction of Dynamics: Noise and
Noise-inspired Regularization [58.720142291102135]
Recent has shown that machine learning (ML) models can be trained to accurately forecast the dynamics of chaotic dynamical systems.
In the absence of mitigating techniques, this technique can result in artificially rapid error growth, leading to inaccurate predictions and/or climate instability.
We introduce Linearized Multi-Noise Training (LMNT), a regularization technique that deterministically approximates the effect of many small, independent noise realizations added to the model input during training.
arXiv Detail & Related papers (2022-11-09T23:40:52Z) - Self-learning locally-optimal hypertuning using maximum entropy, and
comparison of machine learning approaches for estimating fatigue life in
composite materials [0.0]
We develop an ML nearest-neighbors-alike algorithm based on the principle of maximum entropy to predict fatigue damage.
The predictions achieve a good level of accuracy, similar to other ML algorithms.
arXiv Detail & Related papers (2022-10-19T12:20:07Z) - Learning a Restricted Boltzmann Machine using biased Monte Carlo
sampling [0.6554326244334867]
We show that sampling the equilibrium distribution via Markov Chain Monte Carlo can be dramatically accelerated using biased sampling techniques.
We also show that this sampling technique can be exploited to improve the computation of the log-likelihood gradient during the training too.
arXiv Detail & Related papers (2022-06-02T21:29:01Z) - No MCMC for me: Amortized sampling for fast and stable training of
energy-based models [62.1234885852552]
Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty.
We present a simple method for training EBMs at scale using an entropy-regularized generator to amortize the MCMC sampling.
Next, we apply our estimator to the recently proposed Joint Energy Model (JEM), where we match the original performance with faster and stable training.
arXiv Detail & Related papers (2020-10-08T19:17:20Z) - Non-convex Learning via Replica Exchange Stochastic Gradient MCMC [25.47669573608621]
We propose an adaptive replica exchange SGMCMC (reSGMCMC) to automatically correct the bias and study the corresponding properties.
Empirically, we test the algorithm through extensive experiments on various setups and obtain the results.
arXiv Detail & Related papers (2020-08-12T15:02:59Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.