The Optimal Noise in Noise-Contrastive Learning Is Not What You Think
- URL: http://arxiv.org/abs/2203.01110v1
- Date: Wed, 2 Mar 2022 13:59:20 GMT
- Title: The Optimal Noise in Noise-Contrastive Learning Is Not What You Think
- Authors: Omar Chehab, Alexandre Gramfort, Aapo Hyvarinen
- Abstract summary: We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
- Score: 80.07065346699005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning a parametric model of a data distribution is a well-known
statistical problem that has seen renewed interest as it is brought to scale in
deep learning. Framing the problem as a self-supervised task, where data
samples are discriminated from noise samples, is at the core of
state-of-the-art methods, beginning with Noise-Contrastive Estimation (NCE).
Yet, such contrastive learning requires a good noise distribution, which is
hard to specify; domain-specific heuristics are therefore widely used. While a
comprehensive theory is missing, it is widely assumed that the optimal noise
should in practice be made equal to the data, both in distribution and
proportion. This setting underlies Generative Adversarial Networks (GANs) in
particular. Here, we empirically and theoretically challenge this assumption on
the optimal noise. We show that deviating from this assumption can actually
lead to better statistical estimators, in terms of asymptotic variance. In
particular, the optimal noise distribution is different from the data's and
even from a different family.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Robust Estimation of Causal Heteroscedastic Noise Models [7.568978862189266]
Student's $t$-distribution is known for its robustness in accounting for sampling variability with smaller sample sizes and extreme values without significantly altering the overall distribution shape.
Our empirical evaluations demonstrate that our estimators are more robust and achieve better overall performance across synthetic and real benchmarks.
arXiv Detail & Related papers (2023-12-15T02:26:35Z) - Understanding Noise-Augmented Training for Randomized Smoothing [14.061680807550722]
Randomized smoothing is a technique for providing provable robustness guarantees against adversarial attacks.
We show that, without making stronger distributional assumptions, no benefit can be expected from predictors trained with noise-augmentation.
Our analysis has direct implications to the practical deployment of randomized smoothing.
arXiv Detail & Related papers (2023-05-08T14:46:34Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - Pitfalls of Gaussians as a noise distribution in NCE [22.23473249312549]
Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality.
We show that the choice of $q$ can severely impact the computational and statistical efficiency of NCE.
arXiv Detail & Related papers (2022-10-01T04:42:56Z) - Identifying Hard Noise in Long-Tailed Sample Distribution [76.16113794808001]
We introduce Noisy Long-Tailed Classification (NLT)
Most de-noising methods fail to identify the hard noises.
We design an iterative noisy learning framework called Hard-to-Easy (H2E)
arXiv Detail & Related papers (2022-07-27T09:03:03Z) - Analyzing and Improving the Optimization Landscape of Noise-Contrastive
Estimation [50.85788484752612]
Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models.
It has been empirically observed that the choice of the noise distribution is crucial for NCE's performance.
In this work, we formally pinpoint reasons for NCE's poor performance when an inappropriate noise distribution is used.
arXiv Detail & Related papers (2021-10-21T16:57:45Z) - Adaptive Multi-View ICA: Estimation of noise levels for optimal
inference [65.94843987207445]
Adaptive multiView ICA (AVICA) is a noisy ICA model where each view is a linear mixture of shared independent sources with additive noise on the sources.
On synthetic data, AVICA yields better sources estimates than other group ICA methods thanks to its explicit MMSE estimator.
On real magnetoencephalograpy (MEG) data, we provide evidence that the decomposition is less sensitive to sampling noise and that the noise variance estimates are biologically plausible.
arXiv Detail & Related papers (2021-02-22T13:10:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.