Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation
- URL: http://arxiv.org/abs/2301.09696v1
- Date: Mon, 23 Jan 2023 19:57:58 GMT
- Title: Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation
- Authors: Omar Chehab and Alexandre Gramfort and Aapo Hyvarinen
- Abstract summary: It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
- Score: 80.07065346699005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning is an increasingly popular approach to unsupervised
learning, achieving state-of-the-art results. A prevalent approach consists in
contrasting data points and noise points within a classification task: this
requires a good noise distribution which is notoriously hard to specify. While
a comprehensive theory is missing, it is widely assumed that the optimal noise
distribution should in practice be made equal to the data distribution, as in
Generative Adversarial Networks (GANs). We here empirically and theoretically
challenge this assumption. We turn to Noise-Contrastive Estimation (NCE) which
grounds this self-supervised task as an estimation problem of an energy-based
model of the data. This ties the optimality of the noise distribution to the
sample efficiency of the estimator, which is rigorously defined as its
asymptotic variance, or mean-squared error. In the special case where the
normalization constant only is unknown, we show that NCE recovers a family of
Importance Sampling estimators for which the optimal noise is indeed equal to
the data distribution. However, in the general case where the energy is also
unknown, we prove that the optimal noise density is the data density multiplied
by a correction term based on the Fisher score. In particular, the optimal
noise distribution is different from the data distribution, and is even from a
different family. Nevertheless, we soberly conclude that the optimal noise may
be hard to sample from, and the gain in efficiency can be modest compared to
choosing the noise distribution equal to the data's.
Related papers
- Robust Estimation of Causal Heteroscedastic Noise Models [7.568978862189266]
Student's $t$-distribution is known for its robustness in accounting for sampling variability with smaller sample sizes and extreme values without significantly altering the overall distribution shape.
Our empirical evaluations demonstrate that our estimators are more robust and achieve better overall performance across synthetic and real benchmarks.
arXiv Detail & Related papers (2023-12-15T02:26:35Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Pitfalls of Gaussians as a noise distribution in NCE [22.23473249312549]
Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality.
We show that the choice of $q$ can severely impact the computational and statistical efficiency of NCE.
arXiv Detail & Related papers (2022-10-01T04:42:56Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Analyzing and Improving the Optimization Landscape of Noise-Contrastive
Estimation [50.85788484752612]
Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models.
It has been empirically observed that the choice of the noise distribution is crucial for NCE's performance.
In this work, we formally pinpoint reasons for NCE's poor performance when an inappropriate noise distribution is used.
arXiv Detail & Related papers (2021-10-21T16:57:45Z) - PriorGrad: Improving Conditional Denoising Diffusion Models with
Data-Driven Adaptive Prior [103.00403682863427]
We propose PriorGrad to improve the efficiency of the conditional diffusion model.
We show that PriorGrad achieves a faster convergence leading to data and parameter efficiency and improved quality.
arXiv Detail & Related papers (2021-06-11T14:04:03Z) - Adaptive Multi-View ICA: Estimation of noise levels for optimal
inference [65.94843987207445]
Adaptive multiView ICA (AVICA) is a noisy ICA model where each view is a linear mixture of shared independent sources with additive noise on the sources.
On synthetic data, AVICA yields better sources estimates than other group ICA methods thanks to its explicit MMSE estimator.
On real magnetoencephalograpy (MEG) data, we provide evidence that the decomposition is less sensitive to sampling noise and that the noise variance estimates are biologically plausible.
arXiv Detail & Related papers (2021-02-22T13:10:12Z) - Denoising Score Matching with Random Fourier Features [11.60130641443281]
We derive analytical expression for the Denoising Score matching using the Kernel Exponential Family as a model distribution.
The obtained expression explicitly depends on the noise variance, so the validation loss can be straightforwardly used to tune the noise level.
arXiv Detail & Related papers (2021-01-13T18:02:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.