Learning Gaussian-Bernoulli RBMs using Difference of Convex Functions
Optimization
- URL: http://arxiv.org/abs/2102.06228v1
- Date: Thu, 11 Feb 2021 19:15:54 GMT
- Title: Learning Gaussian-Bernoulli RBMs using Difference of Convex Functions
Optimization
- Authors: Vidyadhar Upadhya and P S Sastry
- Abstract summary: We show that negative log-likelihood for a GB-RBM can be expressed as a difference of convex functions.
We propose a difference of convex functions programming (S-DCP) algorithm for learning the GB-RBM.
It is seen that S-DCP is better than the CD and PCD algorithms in terms of speed of learning and the quality of the generative model learnt.
- Score: 0.9137554315375919
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Gaussian-Bernoulli restricted Boltzmann machine (GB-RBM) is a useful
generative model that captures meaningful features from the given
$n$-dimensional continuous data. The difficulties associated with learning
GB-RBM are reported extensively in earlier studies. They indicate that the
training of the GB-RBM using the current standard algorithms, namely,
contrastive divergence (CD) and persistent contrastive divergence (PCD), needs
a carefully chosen small learning rate to avoid divergence which, in turn,
results in slow learning. In this work, we alleviate such difficulties by
showing that the negative log-likelihood for a GB-RBM can be expressed as a
difference of convex functions if we keep the variance of the conditional
distribution of visible units (given hidden unit states) and the biases of the
visible units, constant. Using this, we propose a stochastic {\em difference of
convex functions} (DC) programming (S-DCP) algorithm for learning the GB-RBM.
We present extensive empirical studies on several benchmark datasets to
validate the performance of this S-DCP algorithm. It is seen that S-DCP is
better than the CD and PCD algorithms in terms of speed of learning and the
quality of the generative model learnt.
Related papers
- Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Neural Operator Variational Inference based on Regularized Stein
Discrepancy for Deep Gaussian Processes [23.87733307119697]
We introduce Neural Operator Variational Inference (NOVI) for Deep Gaussian Processes.
NOVI uses a neural generator to obtain a sampler and minimizes the Regularized Stein Discrepancy in L2 space between the generated distribution and true posterior.
We demonstrate that the bias introduced by our method can be controlled by multiplying the divergence with a constant, which leads to robust error control and ensures the stability and precision of the algorithm.
arXiv Detail & Related papers (2023-09-22T06:56:35Z) - Dimensionality Reduction as Probabilistic Inference [10.714603218784175]
Dimensionality reduction (DR) algorithms compress high-dimensional data into a lower dimensional representation while preserving important features of the data.
We introduce the ProbDR variational framework, which interprets a wide range of classical DR algorithms as probabilistic inference algorithms in this framework.
arXiv Detail & Related papers (2023-04-15T23:48:59Z) - Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers.
We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models.
PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Robust Generative Restricted Kernel Machines using Weighted Conjugate
Feature Duality [11.68800227521015]
We introduce weighted conjugate feature duality in the framework of Restricted Kernel Machines (RKMs)
The RKM formulation allows for an easy integration of methods from classical robust statistics.
Experiments show that the weighted RKM is capable of generating clean images when contamination is present in the training data.
arXiv Detail & Related papers (2020-02-04T09:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.