Fast, accurate training and sampling of Restricted Boltzmann Machines
- URL: http://arxiv.org/abs/2405.15376v1
- Date: Fri, 24 May 2024 09:23:43 GMT
- Title: Fast, accurate training and sampling of Restricted Boltzmann Machines
- Authors: Nicolas Béreux, Aurélien Decelle, Cyril Furtlehner, Lorenzo Rosset, Beatriz Seoane,
- Abstract summary: We present an innovative method in which the principal directions of the dataset are integrated into a low-rank RBM.
This approach enables efficient sampling of the equilibrium measure via a static Monte Carlo process.
Our results show that this strategy successfully trains RBMs to capture the full diversity of data in datasets where previous methods fail.
- Score: 4.785158987724452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thanks to their simple architecture, Restricted Boltzmann Machines (RBMs) are powerful tools for modeling complex systems and extracting interpretable insights from data. However, training RBMs, as other energy-based models, on highly structured data poses a major challenge, as effective training relies on mixing the Markov chain Monte Carlo simulations used to estimate the gradient. This process is often hindered by multiple second-order phase transitions and the associated critical slowdown. In this paper, we present an innovative method in which the principal directions of the dataset are integrated into a low-rank RBM through a convex optimization procedure. This approach enables efficient sampling of the equilibrium measure via a static Monte Carlo process. By starting the standard training process with a model that already accurately represents the main modes of the data, we bypass the initial phase transitions. Our results show that this strategy successfully trains RBMs to capture the full diversity of data in datasets where previous methods fail. Furthermore, we use the training trajectories to propose a new sampling method, {\em parallel trajectory tempering}, which allows us to sample the equilibrium measure of the trained model much faster than previous optimized MCMC approaches and a better estimation of the log-likelihood. We illustrate the success of the training method on several highly structured datasets.
Related papers
- Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences [6.067007470552307]
We propose a methodology for finding sequences of machine learning models that are stable across retraining iterations.
We develop a mixed-integer optimization formulation that is guaranteed to recover optimal models.
Our method shows stronger stability than greedily trained models with a small, controllable sacrifice in predictive power.
arXiv Detail & Related papers (2024-03-28T22:45:38Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Learning a Restricted Boltzmann Machine using biased Monte Carlo
sampling [0.6554326244334867]
We show that sampling the equilibrium distribution via Markov Chain Monte Carlo can be dramatically accelerated using biased sampling techniques.
We also show that this sampling technique can be exploited to improve the computation of the log-likelihood gradient during the training too.
arXiv Detail & Related papers (2022-06-02T21:29:01Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Real-time Forecast Models for TBM Load Parameters Based on Machine
Learning Methods [6.247628933072029]
In this paper, based on in-situ TBM operational data, we use the machine-learning (ML) methods to build the real-time forecast models for TBM load parameters.
To decrease the model complexity and improve the generalization, we also apply the least absolute shrinkage and selection (Lasso) method to extract the essential features of the forecast task.
arXiv Detail & Related papers (2021-04-12T07:31:39Z) - No MCMC for me: Amortized sampling for fast and stable training of
energy-based models [62.1234885852552]
Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty.
We present a simple method for training EBMs at scale using an entropy-regularized generator to amortize the MCMC sampling.
Next, we apply our estimator to the recently proposed Joint Energy Model (JEM), where we match the original performance with faster and stable training.
arXiv Detail & Related papers (2020-10-08T19:17:20Z) - Non-convex Learning via Replica Exchange Stochastic Gradient MCMC [25.47669573608621]
We propose an adaptive replica exchange SGMCMC (reSGMCMC) to automatically correct the bias and study the corresponding properties.
Empirically, we test the algorithm through extensive experiments on various setups and obtain the results.
arXiv Detail & Related papers (2020-08-12T15:02:59Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Using Machine Learning to Emulate Agent-Based Simulations [0.0]
We evaluate the performance of multiple machine-learning methods as statistical emulators for use in the analysis of agent-based models (ABMs)
We propose that agent-based modelling would benefit from using machine-learning methods for emulation, as this can facilitate more robust sensitivity analyses for the models.
arXiv Detail & Related papers (2020-05-05T11:48:36Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.