Robust Inference of Manifold Density and Geometry by Doubly Stochastic
Scaling
- URL: http://arxiv.org/abs/2209.08004v2
- Date: Tue, 11 Jul 2023 02:46:14 GMT
- Title: Robust Inference of Manifold Density and Geometry by Doubly Stochastic
Scaling
- Authors: Boris Landa and Xiuyuan Cheng
- Abstract summary: We develop tools for robust inference under high-dimensional noise.
We show that our approach is robust to variability in technical noise levels across cell types.
- Score: 8.271859911016719
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Gaussian kernel and its traditional normalizations (e.g., row-stochastic)
are popular approaches for assessing similarities between data points. Yet,
they can be inaccurate under high-dimensional noise, especially if the noise
magnitude varies considerably across the data, e.g., under heteroskedasticity
or outliers. In this work, we investigate a more robust alternative -- the
doubly stochastic normalization of the Gaussian kernel. We consider a setting
where points are sampled from an unknown density on a low-dimensional manifold
embedded in high-dimensional space and corrupted by possibly strong,
non-identically distributed, sub-Gaussian noise. We establish that the doubly
stochastic affinity matrix and its scaling factors concentrate around certain
population forms, and provide corresponding finite-sample probabilistic error
bounds. We then utilize these results to develop several tools for robust
inference under general high-dimensional noise. First, we derive a robust
density estimator that reliably infers the underlying sampling density and can
substantially outperform the standard kernel density estimator under
heteroskedasticity and outliers. Second, we obtain estimators for the pointwise
noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean
distances between clean data points. Lastly, we derive robust graph Laplacian
normalizations that accurately approximate various manifold Laplacians,
including the Laplace Beltrami operator, improving over traditional
normalizations in noisy settings. We exemplify our results in simulations and
on real single-cell RNA-sequencing data. For the latter, we show that in
contrast to traditional methods, our approach is robust to variability in
technical noise levels across cell types.
Related papers
- A quasi-Bayesian sequential approach to deconvolution density estimation [7.10052009802944]
Density deconvolution addresses the estimation of the unknown density function $f$ of a random signal from data.
We consider the problem of density deconvolution in a streaming or online setting where noisy data arrive progressively.
By relying on a quasi-Bayesian sequential approach, we obtain estimates of $f$ that are of easy evaluation.
arXiv Detail & Related papers (2024-08-26T16:40:04Z) - A Bayesian Approach Toward Robust Multidimensional Ellipsoid-Specific Fitting [0.0]
This work presents a novel and effective method for fitting multidimensional ellipsoids to scattered data in the contamination of noise and outliers.
We incorporate a uniform prior distribution to constrain the search for primitive parameters within an ellipsoidal domain.
We apply it to a wide range of practical applications such as microscopy cell counting, 3D reconstruction, geometric shape approximation, and magnetometer calibration tasks.
arXiv Detail & Related papers (2024-07-27T14:31:51Z) - Implicit Manifold Gaussian Process Regression [49.0787777751317]
Gaussian process regression is widely used to provide well-calibrated uncertainty estimates.
It struggles with high-dimensional data because of the implicit low-dimensional manifold upon which the data actually lies.
In this paper we propose a technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way.
arXiv Detail & Related papers (2023-10-30T09:52:48Z) - Sobolev Space Regularised Pre Density Models [51.558848491038916]
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density.
This method is statistically consistent, and makes the inductive validation model clear and consistent.
arXiv Detail & Related papers (2023-07-25T18:47:53Z) - Anomaly Detection with Variance Stabilized Density Estimation [49.46356430493534]
We present a variance-stabilized density estimation problem for maximizing the likelihood of the observed samples.
To obtain a reliable anomaly detector, we introduce a spectral ensemble of autoregressive models for learning the variance-stabilized distribution.
We have conducted an extensive benchmark with 52 datasets, demonstrating that our method leads to state-of-the-art results.
arXiv Detail & Related papers (2023-06-01T11:52:58Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Heavy-tailed denoising score matching [5.371337604556311]
We develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in Langevin dynamics.
On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.
arXiv Detail & Related papers (2021-12-17T22:04:55Z) - Consistency Regularization for Certified Robustness of Smoothed
Classifiers [89.72878906950208]
A recent technique of randomized smoothing has shown that the worst-case $ell$-robustness can be transformed into the average-case robustness.
We found that the trade-off between accuracy and certified robustness of smoothed classifiers can be greatly controlled by simply regularizing the prediction consistency over noise.
arXiv Detail & Related papers (2020-06-07T06:57:43Z) - Doubly-Stochastic Normalization of the Gaussian Kernel is Robust to
Heteroskedastic Noise [3.5429774642987915]
We show that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal is robust to heteroskedastic noise.
We provide examples of simulated and experimental single-cell RNA sequence data with intrinsic heteroskedasticity.
arXiv Detail & Related papers (2020-05-31T01:31:10Z) - Nearest Neighbor Dirichlet Mixtures [3.3194866396158]
We propose a class of nearest neighbor-Dirichlet mixtures to maintain most of the strengths of Bayesian approaches without the computational disadvantages.
A simple and embarrassingly parallel Monte Carlo algorithm is proposed to sample from the resulting pseudo-posterior for the unknown density.
arXiv Detail & Related papers (2020-03-17T21:39:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.