Mode-Assisted Unsupervised Learning of Restricted Boltzmann Machines
- URL: http://arxiv.org/abs/2001.05559v2
- Date: Sun, 19 Jan 2020 21:50:27 GMT
- Title: Mode-Assisted Unsupervised Learning of Restricted Boltzmann Machines
- Authors: Haik Manukian, Yan Ru Pei, Sean R.B. Bearden, Massimiliano Di Ventra
- Abstract summary: We show that properly combining standard gradient updates with an off-gradient direction improves their training dramatically over traditional gradient methods.
This approach, which we call mode training, promotes faster training and stability, in addition to lower converged relative entropy (KL divergence)
The mode training we suggest is quite versatile, as it can be applied in conjunction with any given gradient method, and is easily extended to more general energy-based neural network structures.
- Score: 7.960229223744695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Restricted Boltzmann machines (RBMs) are a powerful class of generative
models, but their training requires computing a gradient that, unlike
supervised backpropagation on typical loss functions, is notoriously difficult
even to approximate. Here, we show that properly combining standard gradient
updates with an off-gradient direction, constructed from samples of the RBM
ground state (mode), improves their training dramatically over traditional
gradient methods. This approach, which we call mode training, promotes faster
training and stability, in addition to lower converged relative entropy (KL
divergence). Along with the proofs of stability and convergence of this method,
we also demonstrate its efficacy on synthetic datasets where we can compute KL
divergences exactly, as well as on a larger machine learning standard, MNIST.
The mode training we suggest is quite versatile, as it can be applied in
conjunction with any given gradient method, and is easily extended to more
general energy-based neural network structures such as deep, convolutional and
unrestricted Boltzmann machines.
Related papers
- Fast, accurate training and sampling of Restricted Boltzmann Machines [4.785158987724452]
We present an innovative method in which the principal directions of the dataset are integrated into a low-rank RBM.
This approach enables efficient sampling of the equilibrium measure via a static Monte Carlo process.
Our results show that this strategy successfully trains RBMs to capture the full diversity of data in datasets where previous methods fail.
arXiv Detail & Related papers (2024-05-24T09:23:43Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - Monotone deep Boltzmann machines [86.50247625239406]
Deep Boltzmann machines (DBMs) are multi-layered probabilistic models governed by a pairwise energy function.
We develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer.
We show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution.
arXiv Detail & Related papers (2023-07-11T03:02:44Z) - Phantom Embeddings: Using Embedding Space for Model Regularization in
Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data.
The complex models tend to memorize the training data, which results in poor regularization performance on test data.
We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z) - Compression-aware Training of Neural Networks using Frank-Wolfe [27.69586583737247]
We propose a framework that encourages convergence to well-performing solutions while inducing robustness towards filter pruning and low-rank matrix decomposition.
Our method is able to outperform existing compression-aware approaches and, in the case of low-rank matrix decomposition, it also requires significantly less computational resources than approaches based on nuclear-norm regularization.
arXiv Detail & Related papers (2022-05-24T09:29:02Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Mode-Assisted Joint Training of Deep Boltzmann Machines [10.292439652458157]
We show that the performance gains of the mode-assisted training are even more dramatic for DBMs.
DBMs jointly trained with the mode-assisted algorithm can represent the same data set with orders of magnitude lower number of parameters.
arXiv Detail & Related papers (2021-02-17T04:03:30Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - No MCMC for me: Amortized sampling for fast and stable training of
energy-based models [62.1234885852552]
Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty.
We present a simple method for training EBMs at scale using an entropy-regularized generator to amortize the MCMC sampling.
Next, we apply our estimator to the recently proposed Joint Energy Model (JEM), where we match the original performance with faster and stable training.
arXiv Detail & Related papers (2020-10-08T19:17:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.