End-to-end Training of Deep Boltzmann Machines by Unbiased Contrastive
Divergence with Local Mode Initialization
- URL: http://arxiv.org/abs/2305.19684v1
- Date: Wed, 31 May 2023 09:28:02 GMT
- Title: End-to-end Training of Deep Boltzmann Machines by Unbiased Contrastive
Divergence with Local Mode Initialization
- Authors: Shohei Taniguchi, Masahiro Suzuki, Yusuke Iwasawa, Yutaka Matsuo
- Abstract summary: We address the problem of biased gradient estimation in deep Boltzmann machines (DBMs)
We propose a coupling based on the Metropolis-Hastings (MH) and to initialize the state around a local mode of the target distribution.
Because of the propensity of MH to reject proposals, the coupling tends to converge in only one step with a high probability, leading to high efficiency.
- Score: 23.008689183810695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of biased gradient estimation in deep Boltzmann
machines (DBMs). The existing method to obtain an unbiased estimator uses a
maximal coupling based on a Gibbs sampler, but when the state is
high-dimensional, it takes a long time to converge. In this study, we propose
to use a coupling based on the Metropolis-Hastings (MH) and to initialize the
state around a local mode of the target distribution. Because of the propensity
of MH to reject proposals, the coupling tends to converge in only one step with
a high probability, leading to high efficiency. We find that our method allows
DBMs to be trained in an end-to-end fashion without greedy pretraining. We also
propose some practical techniques to further improve the performance of DBMs.
We empirically demonstrate that our training algorithm enables DBMs to show
comparable generative performance to other deep generative models, achieving
the FID score of 10.33 for MNIST.
Related papers
- Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming.
There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models.
We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z) - Monotone deep Boltzmann machines [86.50247625239406]
Deep Boltzmann machines (DBMs) are multi-layered probabilistic models governed by a pairwise energy function.
We develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer.
We show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution.
arXiv Detail & Related papers (2023-07-11T03:02:44Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - Scalable Optimal Margin Distribution Machine [50.281535710689795]
Optimal margin Distribution Machine (ODM) is a newly proposed statistical learning framework rooting in the novel margin theory.
This paper proposes a scalable ODM, which can achieve nearly ten times speedup compared to the original ODM training method.
arXiv Detail & Related papers (2023-05-08T16:34:04Z) - Non-Generative Energy Based Models [3.1447898427012473]
Energy-based models (EBM) have become increasingly popular within computer vision.
We propose a non-generative training approach, Non-Generative EBM (NG-EBM)
We show that our NG-EBM training strategy retains many of the benefits of EBM in calibration, out-of-distribution detection, and adversarial resistance.
arXiv Detail & Related papers (2023-04-03T18:47:37Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Mode-Assisted Joint Training of Deep Boltzmann Machines [10.292439652458157]
We show that the performance gains of the mode-assisted training are even more dramatic for DBMs.
DBMs jointly trained with the mode-assisted algorithm can represent the same data set with orders of magnitude lower number of parameters.
arXiv Detail & Related papers (2021-02-17T04:03:30Z) - No MCMC for me: Amortized sampling for fast and stable training of
energy-based models [62.1234885852552]
Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty.
We present a simple method for training EBMs at scale using an entropy-regularized generator to amortize the MCMC sampling.
Next, we apply our estimator to the recently proposed Joint Energy Model (JEM), where we match the original performance with faster and stable training.
arXiv Detail & Related papers (2020-10-08T19:17:20Z) - Mode-Assisted Unsupervised Learning of Restricted Boltzmann Machines [7.960229223744695]
We show that properly combining standard gradient updates with an off-gradient direction improves their training dramatically over traditional gradient methods.
This approach, which we call mode training, promotes faster training and stability, in addition to lower converged relative entropy (KL divergence)
The mode training we suggest is quite versatile, as it can be applied in conjunction with any given gradient method, and is easily extended to more general energy-based neural network structures.
arXiv Detail & Related papers (2020-01-15T21:12:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.