Monotone deep Boltzmann machines
- URL: http://arxiv.org/abs/2307.04990v1
- Date: Tue, 11 Jul 2023 03:02:44 GMT
- Title: Monotone deep Boltzmann machines
- Authors: Zhili Feng, Ezra Winston, J. Zico Kolter
- Abstract summary: Deep Boltzmann machines (DBMs) are multi-layered probabilistic models governed by a pairwise energy function.
We develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer.
We show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution.
- Score: 86.50247625239406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Boltzmann machines (DBMs), one of the first ``deep'' learning methods
ever studied, are multi-layered probabilistic models governed by a pairwise
energy function that describes the likelihood of all variables/nodes in the
network. In practice, DBMs are often constrained, i.e., via the
\emph{restricted} Boltzmann machine (RBM) architecture (which does not permit
intra-layer connections), in order to allow for more efficient inference. In
this work, we revisit the generic DBM approach, and ask the question: are there
other possible restrictions to their design that would enable efficient
(approximate) inference? In particular, we develop a new class of restricted
model, the monotone DBM, which allows for arbitrary self-connection in each
layer, but restricts the \emph{weights} in a manner that guarantees the
existence and global uniqueness of a mean-field fixed point. To do this, we
leverage tools from the recently-proposed monotone Deep Equilibrium model and
show that a particular choice of activation results in a fixed-point iteration
that gives a variational mean-field solution. While this approach is still
largely conceptual, it is the first architecture that allows for efficient
approximate inference in fully-general weight structures for DBMs. We apply
this approach to simple deep convolutional Boltzmann architectures and
demonstrate that it allows for tasks such as the joint completion and
classification of images, within a single deep probabilistic setting, while
avoiding the pitfalls of mean-field inference in traditional RBMs.
Related papers
- Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
We aim to optimize downstream reward functions while preserving the naturalness of these design spaces.
Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z) - Fast Ensembling with Diffusion Schrödinger Bridge [17.334437293164566]
Deep Ensemble (DE) approach is a straightforward technique used to enhance the performance of deep neural networks by training them from different initial points, converging towards various local optima.
We propose a novel approach called Diffusion Bridge Network (DBN) to address this challenge.
By substituting the heavy ensembles with this lightweight neural network DBN, we achieved inference with reduced computational cost while maintaining accuracy and uncertainty scores on benchmark datasets such as CIFAR-10, CIFAR-100, and TinyImageNet.
arXiv Detail & Related papers (2024-04-24T11:35:02Z) - Generalized Schrödinger Bridge Matching [54.171931505066]
Generalized Schr"odinger Bridge (GSB) problem setup is prevalent in many scientific areas both within and without machine learning.
We propose Generalized Schr"odinger Bridge Matching (GSBM), a new matching algorithm inspired by recent advances.
We show that such a generalization can be cast as solving conditional optimal control, for which variational approximations can be used.
arXiv Detail & Related papers (2023-10-03T17:42:11Z) - End-to-end Training of Deep Boltzmann Machines by Unbiased Contrastive
Divergence with Local Mode Initialization [23.008689183810695]
We address the problem of biased gradient estimation in deep Boltzmann machines (DBMs)
We propose a coupling based on the Metropolis-Hastings (MH) and to initialize the state around a local mode of the target distribution.
Because of the propensity of MH to reject proposals, the coupling tends to converge in only one step with a high probability, leading to high efficiency.
arXiv Detail & Related papers (2023-05-31T09:28:02Z) - Deep Model Reassembly [60.6531819328247]
We explore a novel knowledge-transfer task, termed as Deep Model Reassembly (DeRy)
The goal of DeRy is to first dissect each model into distinctive building blocks, and then selectively reassemble the derived blocks to produce customized networks.
We demonstrate that on ImageNet, the best reassemble model achieves 78.6% top-1 accuracy without fine-tuning.
arXiv Detail & Related papers (2022-10-24T10:16:13Z) - Deep Generalized Schr\"odinger Bridge [26.540105544872958]
Mean-Field Game serves as a crucial mathematical framework in modeling the collective behavior of individual agents.
We show that Schr"odinger Bridge - as an entropy-regularized optimal transport model - can be generalized to accept mean-field structures.
Our method, named Deep Generalized Schr"odinger Bridge (DeepGSB), outperforms prior methods in solving classical population navigation MFGs.
arXiv Detail & Related papers (2022-09-20T17:47:15Z) - FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear
Modulation [69.34011200590817]
We introduce FiLM-Ensemble, a deep, implicit ensemble method based on the concept of Feature-wise Linear Modulation.
By modulating the network activations of a single deep network with FiLM, one obtains a model ensemble with high diversity.
We show that FiLM-Ensemble outperforms other implicit ensemble methods, and it comes very close to the upper bound of an explicit ensemble of networks.
arXiv Detail & Related papers (2022-05-31T18:33:15Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Mode-Assisted Unsupervised Learning of Restricted Boltzmann Machines [7.960229223744695]
We show that properly combining standard gradient updates with an off-gradient direction improves their training dramatically over traditional gradient methods.
This approach, which we call mode training, promotes faster training and stability, in addition to lower converged relative entropy (KL divergence)
The mode training we suggest is quite versatile, as it can be applied in conjunction with any given gradient method, and is easily extended to more general energy-based neural network structures.
arXiv Detail & Related papers (2020-01-15T21:12:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.