Mode-Assisted Unsupervised Learning of Restricted Boltzmann Machines
- URL: http://arxiv.org/abs/2001.05559v2
- Date: Sun, 19 Jan 2020 21:50:27 GMT
- Title: Mode-Assisted Unsupervised Learning of Restricted Boltzmann Machines
- Authors: Haik Manukian, Yan Ru Pei, Sean R.B. Bearden, Massimiliano Di Ventra
- Abstract summary: We show that properly combining standard gradient updates with an off-gradient direction improves their training dramatically over traditional gradient methods.
This approach, which we call mode training, promotes faster training and stability, in addition to lower converged relative entropy (KL divergence)
The mode training we suggest is quite versatile, as it can be applied in conjunction with any given gradient method, and is easily extended to more general energy-based neural network structures.
- Score: 7.960229223744695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Restricted Boltzmann machines (RBMs) are a powerful class of generative
models, but their training requires computing a gradient that, unlike
supervised backpropagation on typical loss functions, is notoriously difficult
even to approximate. Here, we show that properly combining standard gradient
updates with an off-gradient direction, constructed from samples of the RBM
ground state (mode), improves their training dramatically over traditional
gradient methods. This approach, which we call mode training, promotes faster
training and stability, in addition to lower converged relative entropy (KL
divergence). Along with the proofs of stability and convergence of this method,
we also demonstrate its efficacy on synthetic datasets where we can compute KL
divergences exactly, as well as on a larger machine learning standard, MNIST.
The mode training we suggest is quite versatile, as it can be applied in
conjunction with any given gradient method, and is easily extended to more
general energy-based neural network structures such as deep, convolutional and
unrestricted Boltzmann machines.
Related papers
- Classifier-guided Gradient Modulation for Enhanced Multimodal Learning [50.7008456698935]
Gradient-Guided Modulation (CGGM) is a novel method to balance multimodal learning with gradients.
We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS.
CGGM outperforms all the baselines and other state-of-the-art methods consistently.
arXiv Detail & Related papers (2024-11-03T02:38:43Z) - Avoiding mode collapse in diffusion models fine-tuned with reinforcement learning [0.0]
Fine-tuning foundation models via reinforcement learning (RL) has proven promising for aligning to downstream objectives.
We exploit the hierarchical nature of diffusion models (DMs) and train them dynamically at each epoch with a tailored RL method.
We show that models trained with HRF achieve better preservation of diversity in downstream tasks, thus enhancing the fine-tuning robustness and at uncompromising mean rewards.
arXiv Detail & Related papers (2024-10-10T19:06:23Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Stability-Aware Training of Machine Learning Force Fields with Differentiable Boltzmann Estimators [11.699834591020057]
Stability-Aware Boltzmann Estimator (StABlE) Training is a multi-modal training procedure which leverages joint supervision from reference quantum-mechanical calculations and system observables.
StABlE Training can be viewed as a general semi-empirical framework applicable across MLFF architectures and systems.
arXiv Detail & Related papers (2024-02-21T18:12:07Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - Monotone deep Boltzmann machines [86.50247625239406]
Deep Boltzmann machines (DBMs) are multi-layered probabilistic models governed by a pairwise energy function.
We develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer.
We show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution.
arXiv Detail & Related papers (2023-07-11T03:02:44Z) - Phantom Embeddings: Using Embedding Space for Model Regularization in
Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data.
The complex models tend to memorize the training data, which results in poor regularization performance on test data.
We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z) - Mode-Assisted Joint Training of Deep Boltzmann Machines [10.292439652458157]
We show that the performance gains of the mode-assisted training are even more dramatic for DBMs.
DBMs jointly trained with the mode-assisted algorithm can represent the same data set with orders of magnitude lower number of parameters.
arXiv Detail & Related papers (2021-02-17T04:03:30Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - No MCMC for me: Amortized sampling for fast and stable training of
energy-based models [62.1234885852552]
Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty.
We present a simple method for training EBMs at scale using an entropy-regularized generator to amortize the MCMC sampling.
Next, we apply our estimator to the recently proposed Joint Energy Model (JEM), where we match the original performance with faster and stable training.
arXiv Detail & Related papers (2020-10-08T19:17:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.