Contrastive Divergence Learning is a Time Reversal Adversarial Game
- URL: http://arxiv.org/abs/2012.03295v3
- Date: Mon, 15 Mar 2021 20:03:43 GMT
- Title: Contrastive Divergence Learning is a Time Reversal Adversarial Game
- Authors: Omer Yair, Tomer Michaeli
- Abstract summary: Contrastive divergence (CD) learning is a classical method for fitting unnormalized statistical models to data samples.
We show that CD is an adversarial learning procedure, where a discriminator attempts to classify whether a Markov chain generated from the model has been time-reversed.
Our derivation settles well with previous observations, which have concluded that CD's update steps cannot be expressed as the gradients of any fixed objective function.
- Score: 32.46369991490501
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive divergence (CD) learning is a classical method for fitting
unnormalized statistical models to data samples. Despite its wide-spread use,
the convergence properties of this algorithm are still not well understood. The
main source of difficulty is an unjustified approximation which has been used
to derive the gradient of the loss. In this paper, we present an alternative
derivation of CD that does not require any approximation and sheds new light on
the objective that is actually being optimized by the algorithm. Specifically,
we show that CD is an adversarial learning procedure, where a discriminator
attempts to classify whether a Markov chain generated from the model has been
time-reversed. Thus, although predating generative adversarial networks (GANs)
by more than a decade, CD is, in fact, closely related to these techniques. Our
derivation settles well with previous observations, which have concluded that
CD's update steps cannot be expressed as the gradients of any fixed objective
function. In addition, as a byproduct, our derivation reveals a simple
correction that can be used as an alternative to Metropolis-Hastings rejection,
which is required when the underlying Markov chain is inexact (e.g. when using
Langevin dynamics with a large step).
Related papers
- Sobolev Space Regularised Pre Density Models [51.558848491038916]
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density.
This method is statistically consistent, and makes the inductive validation model clear and consistent.
arXiv Detail & Related papers (2023-07-25T18:47:53Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Resolving the Mixing Time of the Langevin Algorithm to its Stationary
Distribution for Log-Concave Sampling [34.66940399825547]
This paper characterizes the mixing time of the Langevin Algorithm to its stationary distribution.
We introduce a technique from the differential privacy literature to the sampling literature.
arXiv Detail & Related papers (2022-10-16T05:11:16Z) - Bias and Extrapolation in Markovian Linear Stochastic Approximation with
Constant Stepsizes [9.689344942945652]
We consider Linear Approximation (LSA) with a constant stepsize and Markovian data.
We show that the bias vector of this limit admits an infinite series expansion with respect to the stepsize.
We show that the bias can be reduced using Richardson-Romberg extrapolation with $mge 2$ stepsizes.
arXiv Detail & Related papers (2022-10-03T14:11:03Z) - Projected Sliced Wasserstein Autoencoder-based Hyperspectral Images
Anomaly Detection [42.585075865267946]
We propose the Projected Sliced Wasserstein (PSW) autoencoder-based anomaly detection method.
In particular, the computation-friendly eigen-decomposition method is leveraged to find the principal component for slicing the high-dimensional data.
Comprehensive experiments conducted on various real-world hyperspectral anomaly detection benchmarks demonstrate the superior performance of the proposed method.
arXiv Detail & Related papers (2021-12-20T09:21:02Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Learning from non-irreducible Markov chains [0.0]
We focus on the case when the training data set is drawn from a not necessarily irreducible Markov chain.
We first obtain a uniform convergence result for the corresponding sample error, and then we conclude learnability of the approximate sample error minimization algorithm.
arXiv Detail & Related papers (2021-10-08T19:00:19Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial
Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle.
We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM)
We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.