Training Deep Energy-Based Models with f-Divergence Minimization
- URL: http://arxiv.org/abs/2003.03463v2
- Date: Tue, 21 Jul 2020 01:21:03 GMT
- Title: Training Deep Energy-Based Models with f-Divergence Minimization
- Authors: Lantao Yu, Yang Song, Jiaming Song, Stefano Ermon
- Abstract summary: Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging.
We propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence.
Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.
- Score: 113.97274898282343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep energy-based models (EBMs) are very flexible in distribution
parametrization but computationally challenging because of the intractable
partition function. They are typically trained via maximum likelihood, using
contrastive divergence to approximate the gradient of the KL divergence between
data and model distribution. While KL divergence has many desirable properties,
other f-divergences have shown advantages in training implicit density
generative models such as generative adversarial networks. In this paper, we
propose a general variational framework termed f-EBM to train EBMs using any
desired f-divergence. We introduce a corresponding optimization algorithm and
prove its local convergence property with non-linear dynamical systems theory.
Experimental results demonstrate the superiority of f-EBM over contrastive
divergence, as well as the benefits of training EBMs using f-divergences other
than KL.
Related papers
- Learning Mixtures of Experts with EM [28.48469221248906]
Mixtures of Experts (MoE) are Machine Learning models that involve the input space, with a separate "expert" model trained on each partition.
We study the efficiency of the Expectation Maximization (EM) algorithm for the training of MoE models.
arXiv Detail & Related papers (2024-11-09T03:44:09Z) - Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space [72.52365911990935]
We introduce Bellman Diffusion, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling.
Our results show that Bellman Diffusion achieves accurate field estimations and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks.
arXiv Detail & Related papers (2024-10-02T17:53:23Z) - Variational Schrödinger Diffusion Models [14.480273869571468]
Schr"odinger bridge (SB) has emerged as the go-to method for optimizing transportation plans in diffusion models.
We leverage variational inference to linearize the forward score functions (variational scores) of SB.
We propose the variational Schr"odinger diffusion model (VSDM), where the forward process is a multivariate diffusion and the variational scores are adaptively optimized for efficient transport.
arXiv Detail & Related papers (2024-05-08T04:01:40Z) - Generalized Contrastive Divergence: Joint Training of Energy-Based Model
and Diffusion Model through Inverse Reinforcement Learning [13.22531381403974]
Generalized Contrastive Divergence (GCD) is a novel objective function for training an energy-based model (EBM) and a sampler simultaneously.
We present preliminary yet promising results showing that joint training is beneficial for both EBM and a diffusion model.
arXiv Detail & Related papers (2023-12-06T10:10:21Z) - Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces.
We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z) - Score-based Generative Modeling Through Backward Stochastic Differential
Equations: Inversion and Generation [6.2255027793924285]
The proposed BSDE-based diffusion model represents a novel approach to diffusion modeling, which extends the application of differential equations (SDEs) in machine learning.
We demonstrate the theoretical guarantees of the model, the benefits of using Lipschitz networks for score matching, and its potential applications in various areas such as diffusion inversion, conditional diffusion, and uncertainty quantification.
arXiv Detail & Related papers (2023-04-26T01:15:35Z) - Flexible Amortized Variational Inference in qBOLD MRI [56.4324135502282]
Oxygen extraction fraction (OEF) and deoxygenated blood volume (DBV) are more ambiguously determined from the data.
Existing inference methods tend to yield very noisy and underestimated OEF maps, while overestimating DBV.
This work describes a novel probabilistic machine learning approach that can infer plausible distributions of OEF and DBV.
arXiv Detail & Related papers (2022-03-11T10:47:16Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models.
PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.