Heavy-tailed denoising score matching
- URL: http://arxiv.org/abs/2112.09788v1
- Date: Fri, 17 Dec 2021 22:04:55 GMT
- Title: Heavy-tailed denoising score matching
- Authors: Jacob Deasy, Nikola Simidjievski, Pietro Li\`o
- Abstract summary: We develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in Langevin dynamics.
On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.
- Score: 5.371337604556311
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Score-based model research in the last few years has produced state of the
art generative models by employing Gaussian denoising score-matching (DSM).
However, the Gaussian noise assumption has several high-dimensional
limitations, motivating a more concrete route toward even higher dimension PDF
estimation in future. We outline this limitation, before extending the theory
to a broader family of noising distributions -- namely, the generalised normal
distribution. To theoretically ground this, we relax a key assumption in
(denoising) score matching theory, demonstrating that distributions which are
differentiable \textit{almost everywhere} permit the same objective
simplification as Gaussians. For noise vector length distributions, we
demonstrate favourable concentration of measure in the high-dimensional spaces
prevalent in deep learning. In the process, we uncover a skewed noise vector
length distribution and develop an iterative noise scaling algorithm to
consistently initialise the multiple levels of noise in annealed Langevin
dynamics. On the practical side, our use of heavy-tailed DSM leads to improved
score estimation, controllable sampling convergence, and more balanced
unconditional generative performance for imbalanced datasets.
Related papers
- Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - Robust Inference of Manifold Density and Geometry by Doubly Stochastic
Scaling [8.271859911016719]
We develop tools for robust inference under high-dimensional noise.
We show that our approach is robust to variability in technical noise levels across cell types.
arXiv Detail & Related papers (2022-09-16T15:39:11Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Deep Speaker Vector Normalization with Maximum Gaussianality Training [13.310988353839237]
A key problem with deep speaker embedding is that the resulting deep speaker vectors tend to be irregularly distributed.
In previous research, we proposed a deep normalization approach based on a new discriminative normalization flow (DNF) model.
Despite this remarkable success, we empirically found that the latent codes produced by the DNF model are generally neither homogeneous nor Gaussian.
We propose a new Maximum Gaussianality (MG) training approach that directly maximizes the Gaussianality of the latent codes.
arXiv Detail & Related papers (2020-10-30T09:42:06Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z) - Generative Modeling with Denoising Auto-Encoders and Langevin Sampling [88.83704353627554]
We show that both DAE and DSM provide estimates of the score of the smoothed population density.
We then apply our results to the homotopy method of arXiv:1907.05600 and provide theoretical justification for its empirical success.
arXiv Detail & Related papers (2020-01-31T23:50:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.