Related papers: Mode-Assisted Unsupervised Learning of Restricted Boltzmann Machines

Mode-Assisted Unsupervised Learning of Restricted Boltzmann Machines

URL: http://arxiv.org/abs/2001.05559v2
Date: Sun, 19 Jan 2020 21:50:27 GMT
Title: Mode-Assisted Unsupervised Learning of Restricted Boltzmann Machines
Authors: Haik Manukian, Yan Ru Pei, Sean R.B. Bearden, Massimiliano Di Ventra
Abstract summary: We show that properly combining standard gradient updates with an off-gradient direction improves their training dramatically over traditional gradient methods. This approach, which we call mode training, promotes faster training and stability, in addition to lower converged relative entropy (KL divergence) The mode training we suggest is quite versatile, as it can be applied in conjunction with any given gradient method, and is easily extended to more general energy-based neural network structures.
Score: 7.960229223744695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Restricted Boltzmann machines (RBMs) are a powerful class of generative models, but their training requires computing a gradient that, unlike supervised backpropagation on typical loss functions, is notoriously difficult even to approximate. Here, we show that properly combining standard gradient updates with an off-gradient direction, constructed from samples of the RBM ground state (mode), improves their training dramatically over traditional gradient methods. This approach, which we call mode training, promotes faster training and stability, in addition to lower converged relative entropy (KL divergence). Along with the proofs of stability and convergence of this method, we also demonstrate its efficacy on synthetic datasets where we can compute KL divergences exactly, as well as on a larger machine learning standard, MNIST. The mode training we suggest is quite versatile, as it can be applied in conjunction with any given gradient method, and is easily extended to more general energy-based neural network structures such as deep, convolutional and unrestricted Boltzmann machines.

Related papers

Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions [31.75902683077129]
The Restricted Boltzmann Machine (RBM) is one of the simplest generative neural networks capable of learning input distributions.<n>We simplify the standard RBM training objective into a form that is equivalent to the multi-index model with non-separable regularization.<n>We show in particular that RBM reaches the optimal computational weak recovery threshold, aligning with the BBP transition.
arXiv Detail & Related papers (2025-05-23T15:51:46Z)
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models [21.16132396642158]
Training stability is a persistent challenge in the pre-training of large language models (LLMs) We propose Scale-Distribution Decoupling (SDD), a novel approach that stabilizes training by explicitly decoupling the scale and distribution of the weight matrix in fully-connected layers.
arXiv Detail & Related papers (2025-02-21T14:49:34Z)
Classifier-guided Gradient Modulation for Enhanced Multimodal Learning [50.7008456698935]
Gradient-Guided Modulation (CGGM) is a novel method to balance multimodal learning with gradients. We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS. CGGM outperforms all the baselines and other state-of-the-art methods consistently.
arXiv Detail & Related papers (2024-11-03T02:38:43Z)
Avoiding mode collapse in diffusion models fine-tuned with reinforcement learning [0.0]
Fine-tuning foundation models via reinforcement learning (RL) has proven promising for aligning to downstream objectives. We exploit the hierarchical nature of diffusion models (DMs) and train them dynamically at each epoch with a tailored RL method. We show that models trained with HRF achieve better preservation of diversity in downstream tasks, thus enhancing the fine-tuning robustness and at uncompromising mean rewards.
arXiv Detail & Related papers (2024-10-10T19:06:23Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Stability-Aware Training of Machine Learning Force Fields with Differentiable Boltzmann Estimators [11.699834591020057]
Stability-Aware Boltzmann Estimator (StABlE) Training is a multi-modal training procedure which leverages joint supervision from reference quantum-mechanical calculations and system observables. StABlE Training can be viewed as a general semi-empirical framework applicable across MLFF architectures and systems.
arXiv Detail & Related papers (2024-02-21T18:12:07Z)
Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models. We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself. By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z)
Monotone deep Boltzmann machines [86.50247625239406]
Deep Boltzmann machines (DBMs) are multi-layered probabilistic models governed by a pairwise energy function. We develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer. We show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution.
arXiv Detail & Related papers (2023-07-11T03:02:44Z)
Phantom Embeddings: Using Embedding Space for Model Regularization in Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data. The complex models tend to memorize the training data, which results in poor regularization performance on test data. We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z)
Mode-Assisted Joint Training of Deep Boltzmann Machines [10.292439652458157]
We show that the performance gains of the mode-assisted training are even more dramatic for DBMs. DBMs jointly trained with the mode-assisted algorithm can represent the same data set with orders of magnitude lower number of parameters.
arXiv Detail & Related papers (2021-02-17T04:03:30Z)
Training Generative Adversarial Networks by Solving Ordinary Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z)
No MCMC for me: Amortized sampling for fast and stable training of energy-based models [62.1234885852552]
Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty. We present a simple method for training EBMs at scale using an entropy-regularized generator to amortize the MCMC sampling. Next, we apply our estimator to the recently proposed Joint Energy Model (JEM), where we match the original performance with faster and stable training.
arXiv Detail & Related papers (2020-10-08T19:17:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.