Learning Gaussian-Bernoulli RBMs using Difference of Convex Functions
Optimization
- URL: http://arxiv.org/abs/2102.06228v1
- Date: Thu, 11 Feb 2021 19:15:54 GMT
- Title: Learning Gaussian-Bernoulli RBMs using Difference of Convex Functions
Optimization
- Authors: Vidyadhar Upadhya and P S Sastry
- Abstract summary: We show that negative log-likelihood for a GB-RBM can be expressed as a difference of convex functions.
We propose a difference of convex functions programming (S-DCP) algorithm for learning the GB-RBM.
It is seen that S-DCP is better than the CD and PCD algorithms in terms of speed of learning and the quality of the generative model learnt.
- Score: 0.9137554315375919
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Gaussian-Bernoulli restricted Boltzmann machine (GB-RBM) is a useful
generative model that captures meaningful features from the given
$n$-dimensional continuous data. The difficulties associated with learning
GB-RBM are reported extensively in earlier studies. They indicate that the
training of the GB-RBM using the current standard algorithms, namely,
contrastive divergence (CD) and persistent contrastive divergence (PCD), needs
a carefully chosen small learning rate to avoid divergence which, in turn,
results in slow learning. In this work, we alleviate such difficulties by
showing that the negative log-likelihood for a GB-RBM can be expressed as a
difference of convex functions if we keep the variance of the conditional
distribution of visible units (given hidden unit states) and the biases of the
visible units, constant. Using this, we propose a stochastic {\em difference of
convex functions} (DC) programming (S-DCP) algorithm for learning the GB-RBM.
We present extensive empirical studies on several benchmark datasets to
validate the performance of this S-DCP algorithm. It is seen that S-DCP is
better than the CD and PCD algorithms in terms of speed of learning and the
quality of the generative model learnt.
Related papers
- Differentially Private Random Block Coordinate Descent [51.62669821275571]
We propose a differentially private random coordinate descent method that selects multiple coordinates with varying probabilities in each iteration using sketch matrices.
Our algorithm generalizes both DP-CD and the classical DP-SGD (Differentially Private Descent), while preserving the same utility guarantees.
arXiv Detail & Related papers (2024-12-22T15:06:56Z) - Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines.
We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z) - Neural Operator Variational Inference based on Regularized Stein
Discrepancy for Deep Gaussian Processes [23.87733307119697]
We introduce Neural Operator Variational Inference (NOVI) for Deep Gaussian Processes.
NOVI uses a neural generator to obtain a sampler and minimizes the Regularized Stein Discrepancy in L2 space between the generated distribution and true posterior.
We demonstrate that the bias introduced by our method can be controlled by multiplying the divergence with a constant, which leads to robust error control and ensures the stability and precision of the algorithm.
arXiv Detail & Related papers (2023-09-22T06:56:35Z) - Dimensionality Reduction as Probabilistic Inference [10.714603218784175]
Dimensionality reduction (DR) algorithms compress high-dimensional data into a lower dimensional representation while preserving important features of the data.
We introduce the ProbDR variational framework, which interprets a wide range of classical DR algorithms as probabilistic inference algorithms in this framework.
arXiv Detail & Related papers (2023-04-15T23:48:59Z) - Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers.
We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Robust Generative Restricted Kernel Machines using Weighted Conjugate
Feature Duality [11.68800227521015]
We introduce weighted conjugate feature duality in the framework of Restricted Kernel Machines (RKMs)
The RKM formulation allows for an easy integration of methods from classical robust statistics.
Experiments show that the weighted RKM is capable of generating clean images when contamination is present in the training data.
arXiv Detail & Related papers (2020-02-04T09:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.