Pseudo-Spherical Contrastive Divergence
- URL: http://arxiv.org/abs/2111.00780v1
- Date: Mon, 1 Nov 2021 09:17:15 GMT
- Title: Pseudo-Spherical Contrastive Divergence
- Authors: Lantao Yu, Jiaming Song, Yang Song, Stefano Ermon
- Abstract summary: We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models.
PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
- Score: 119.28384561517292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Energy-based models (EBMs) offer flexible distribution parametrization.
However, due to the intractable partition function, they are typically trained
via contrastive divergence for maximum likelihood estimation. In this paper, we
propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum
likelihood learning of EBMs. PS-CD is derived from the maximization of a family
of strictly proper homogeneous scoring rules, which avoids the computation of
the intractable partition function and provides a generalized family of
learning objectives that include contrastive divergence as a special case.
Moreover, PS-CD allows us to flexibly choose various learning objectives to
train EBMs without additional computational cost or variational minimax
optimization. Theoretical analysis on the proposed method and extensive
experiments on both synthetic data and commonly used image datasets demonstrate
the effectiveness and modeling flexibility of PS-CD, as well as its robustness
to data contamination, thus showing its superiority over maximum likelihood and
$f$-EBMs.
Related papers
- Shedding Light on Large Generative Networks: Estimating Epistemic Uncertainty in Diffusion Models [15.352556466952477]
Generative diffusion models are notable for their large parameter count (exceeding 100 million) and operation within high-dimensional image spaces.
We introduce an innovative framework, Diffusion Ensembles for Capturing Uncertainty (DECU), designed for estimating epistemic uncertainty for diffusion models.
arXiv Detail & Related papers (2024-06-05T14:03:21Z) - Break The Spell Of Total Correlation In betaTCVAE [4.38301148531795]
This paper proposes a new iterative decomposition path of total correlation and explains the disentangled representation ability of VAE.
The novel model enables VAE to adjust the parameter capacity to divide dependent and independent data features flexibly.
arXiv Detail & Related papers (2022-10-17T07:16:53Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Fast and Efficient MMD-based Fair PCA via Optimization over Stiefel
Manifold [41.58534159822546]
This paper defines fair principal component analysis (PCA) as minimizing the maximum discrepancy (MMD) between dimensionality-reduced conditional distributions.
We provide optimality guarantees and explicitly show the theoretical effect in practical settings.
arXiv Detail & Related papers (2021-09-23T08:06:02Z) - Identifiable Energy-based Representations: An Application to Estimating
Heterogeneous Causal Effects [83.66276516095665]
Conditional average treatment effects (CATEs) allow us to understand the effect heterogeneity across a large population of individuals.
Typical CATE learners assume all confounding variables are measured in order for the CATE to be identifiable.
We propose an energy-based model (EBM) that learns a low-dimensional representation of the variables by employing a noise contrastive loss function.
arXiv Detail & Related papers (2021-08-06T10:39:49Z) - PSD Representations for Effective Probability Models [117.35298398434628]
We show that a recently proposed class of positive semi-definite (PSD) models for non-negative functions is particularly suited to this end.
We characterize both approximation and generalization capabilities of PSD models, showing that they enjoy strong theoretical guarantees.
Our results open the way to applications of PSD models to density estimation, decision theory and inference.
arXiv Detail & Related papers (2021-06-30T15:13:39Z) - Loss function based second-order Jensen inequality and its application
to particle variational inference [112.58907653042317]
Particle variational inference (PVI) uses an ensemble of models as an empirical approximation for the posterior distribution.
PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models.
We derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models.
arXiv Detail & Related papers (2021-06-09T12:13:51Z) - Posterior-Aided Regularization for Likelihood-Free Inference [23.708122045184698]
Posterior-Aided Regularization (PAR) is applicable to learning the density estimator, regardless of the model structure.
We provide a unified estimation method of PAR to estimate both reverse KL term and mutual information term with a single neural network.
arXiv Detail & Related papers (2021-02-15T16:59:30Z) - Training Deep Energy-Based Models with f-Divergence Minimization [113.97274898282343]
Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging.
We propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence.
Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.
arXiv Detail & Related papers (2020-03-06T23:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.