On the connection between Noise-Contrastive Estimation and Contrastive
Divergence
- URL: http://arxiv.org/abs/2402.16688v1
- Date: Mon, 26 Feb 2024 16:04:47 GMT
- Title: On the connection between Noise-Contrastive Estimation and Contrastive
Divergence
- Authors: Amanda Olmin, Jakob Lindqvist, Lennart Svensson, Fredrik Lindsten
- Abstract summary: Noise-contrastive estimation (NCE) is a popular method for estimating unnormalised probabilistic models.
We show that two NCE criteria, ranking NCE and conditional NCE, can be viewed as ML estimation methods.
- Score: 13.312007032203859
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Noise-contrastive estimation (NCE) is a popular method for estimating
unnormalised probabilistic models, such as energy-based models, which are
effective for modelling complex data distributions. Unlike classical maximum
likelihood (ML) estimation that relies on importance sampling (resulting in
ML-IS) or MCMC (resulting in contrastive divergence, CD), NCE uses a proxy
criterion to avoid the need for evaluating an often intractable normalisation
constant.
Despite apparent conceptual differences, we show that two NCE criteria,
ranking NCE (RNCE) and conditional NCE (CNCE), can be viewed as ML estimation
methods. Specifically, RNCE is equivalent to ML estimation combined with
conditional importance sampling, and both RNCE and CNCE are special cases of
CD. These findings bridge the gap between the two method classes and allow us
to apply techniques from the ML-IS and CD literature to NCE, offering several
advantageous extensions.
Related papers
- Score-Based Training for Energy-Based TTS Models [1.643629306994231]
Noise contrastive estimation (NCE) is a popular method for training energy-based models (EBM) with intractable normalisation terms.<n>This paper proposes a new criterion that learns scores more suitable for first-order schemes.
arXiv Detail & Related papers (2025-05-19T23:12:25Z) - Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains.
We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z) - Semiparametric inference for impulse response functions using double/debiased machine learning [49.1574468325115]
We introduce a machine learning estimator for the impulse response function (IRF) in settings where a time series of interest is subjected to multiple discrete treatments.
The proposed estimator can rely on fully nonparametric relations between treatment and outcome variables, opening up the possibility to use flexible machine learning approaches to estimate IRFs.
arXiv Detail & Related papers (2024-11-15T07:42:02Z) - A Uniform Concentration Inequality for Kernel-Based Two-Sample Statistics [4.757470449749877]
We show that these metrics can be unified under a general framework of kernel-based two-sample statistics.
This paper establishes a novel uniform concentration inequality for the aforementioned kernel-based statistics.
As illustrative applications, we demonstrate how these bounds facilitate the component of error bounds for procedures such as distance covariance-based dimension reduction.
arXiv Detail & Related papers (2024-05-22T22:41:56Z) - VISA: Variational Inference with Sequential Sample-Average Approximations [7.792333134503654]
We present variational inference with sequential sample-average approximation (VISA)
VISA extends importance-weighted forward-KL variational inference by employing a sequence of sample-average approximations.
We demonstrate that VISA can achieve comparable approximation accuracy to standard importance-weighted forward-KL variational inference.
arXiv Detail & Related papers (2024-03-14T14:20:22Z) - Spectral Ranking Inferences based on General Multiway Comparisons [7.222667862159246]
We show that a two-step spectral method can achieve the same vanilla efficiency as the Maximum Likelihood Estor.
It is noteworthy that this is the first time effective two-sample rank testing methods have been proposed.
arXiv Detail & Related papers (2023-08-05T16:31:32Z) - A New Central Limit Theorem for the Augmented IPW Estimator: Variance
Inflation, Cross-Fit Covariance and Beyond [0.9172870611255595]
Cross-fit inverse probability weighting (AIPW) with cross-fitting is a popular choice in practice.
We study this cross-fit AIPW estimator under well-specified outcome regression and propensity score models in a high-dimensional regime.
Our work utilizes a novel interplay between three distinct tools--approximate message passing theory, the theory of deterministic equivalents, and the leave-one-out approach.
arXiv Detail & Related papers (2022-05-20T14:17:53Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Automatically Differentiable Random Coefficient Logistic Demand
Estimation [0.0]
We show how the random coefficient logistic demand (BLP) model can be phrased as an automatically differentiable moment function.
This allows gradient-based frequentist and quasi-Bayesian estimation using the Continuously Updating Estimator (CUE)
Preliminary findings indicate that the CUE estimated using LTE and frequentist optimization has a lower bias but higher MAE compared to the traditional 2-Stage GMM (2S-GMM) approach.
arXiv Detail & Related papers (2021-06-08T18:50:11Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Localized Debiased Machine Learning: Efficient Inference on Quantile
Treatment Effects and Beyond [69.83813153444115]
We consider an efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference.
Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances.
We propose localized debiased machine learning (LDML), which avoids this burdensome step.
arXiv Detail & Related papers (2019-12-30T14:42:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.