Pitfalls of Gaussians as a noise distribution in NCE
- URL: http://arxiv.org/abs/2210.00189v1
- Date: Sat, 1 Oct 2022 04:42:56 GMT
- Title: Pitfalls of Gaussians as a noise distribution in NCE
- Authors: Holden Lee, Chirag Pabbaraju, Anish Sevekari, Andrej Risteski
- Abstract summary: Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality.
We show that the choice of $q$ can severely impact the computational and statistical efficiency of NCE.
- Score: 22.23473249312549
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Noise Contrastive Estimation (NCE) is a popular approach for learning
probability density functions parameterized up to a constant of
proportionality. The main idea is to design a classification problem for
distinguishing training data from samples from an easy-to-sample noise
distribution $q$, in a manner that avoids having to calculate a partition
function. It is well-known that the choice of $q$ can severely impact the
computational and statistical efficiency of NCE. In practice, a common choice
for $q$ is a Gaussian which matches the mean and covariance of the data.
In this paper, we show that such a choice can result in an exponentially bad
(in the ambient dimension) conditioning of the Hessian of the loss, even for
very simple data distributions. As a consequence, both the statistical and
algorithmic complexity for such a choice of $q$ will be problematic in
practice, suggesting that more complex noise distributions are essential to the
success of NCE.
Related papers
- Some Constructions of Private, Efficient, and Optimal $K$-Norm and Elliptic Gaussian Noise [54.34628844260993]
Differentially private computation often begins with a bound on some $d$-dimensional statistic's sensitivity.
For pure differential privacy, the $K$-norm mechanism can improve on this approach using a norm tailored to the statistic's sensitivity space.
This paper solves both problems for the simple statistics of sum, count, and vote.
arXiv Detail & Related papers (2023-09-27T17:09:36Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - Subspace Recovery from Heterogeneous Data with Non-isotropic Noise [43.44371292901258]
We study a basic formulation of this problem: the principal component analysis (PCA)
Our goal is to recover the linear subspace shared by $mu_i$ using the data points from all users.
We design an efficiently-computable estimator under non-spherical and user-dependent noise.
arXiv Detail & Related papers (2022-10-24T18:00:08Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Analyzing and Improving the Optimization Landscape of Noise-Contrastive
Estimation [50.85788484752612]
Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models.
It has been empirically observed that the choice of the noise distribution is crucial for NCE's performance.
In this work, we formally pinpoint reasons for NCE's poor performance when an inappropriate noise distribution is used.
arXiv Detail & Related papers (2021-10-21T16:57:45Z) - Denoising Score Matching with Random Fourier Features [11.60130641443281]
We derive analytical expression for the Denoising Score matching using the Kernel Exponential Family as a model distribution.
The obtained expression explicitly depends on the noise variance, so the validation loss can be straightforwardly used to tune the noise level.
arXiv Detail & Related papers (2021-01-13T18:02:39Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z) - Multi-class Gaussian Process Classification with Noisy Inputs [2.362412515574206]
In some situations, the amount of noise can be known before-hand.
We have evaluated the proposed methods by carrying out several experiments, involving synthetic and real data.
arXiv Detail & Related papers (2020-01-28T18:55:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.