Wasserstein Projection Pursuit of Non-Gaussian Signals
- URL: http://arxiv.org/abs/2302.12693v1
- Date: Fri, 24 Feb 2023 15:36:51 GMT
- Title: Wasserstein Projection Pursuit of Non-Gaussian Signals
- Authors: Satyaki Mukherjee, Soumendu Sundar Mukherjee, Debarghya Ghoshdastidar
- Abstract summary: We consider the problem of locating interesting directions in a $k$-dimensional non-Gaussian subspace of interesting features in a high-dimensional data cloud.
Under a generative model, we prove rigorous statistical guarantees on the accuracy of approxing this unknown subspace.
Our results operate in the regime where the data dimensionality is comparable to the sample size.
- Score: 8.789656856095947
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We consider the general dimensionality reduction problem of locating in a
high-dimensional data cloud, a $k$-dimensional non-Gaussian subspace of
interesting features. We use a projection pursuit approach -- we search for
mutually orthogonal unit directions which maximise the 2-Wasserstein distance
of the empirical distribution of data-projections along these directions from a
standard Gaussian. Under a generative model, where there is a underlying
(unknown) low-dimensional non-Gaussian subspace, we prove rigorous statistical
guarantees on the accuracy of approximating this unknown subspace by the
directions found by our projection pursuit approach. Our results operate in the
regime where the data dimensionality is comparable to the sample size, and thus
supplement the recent literature on the non-feasibility of locating interesting
directions via projection pursuit in the complementary regime where the data
dimensionality is much larger than the sample size.
Related papers
- Relative Wasserstein Angle and the Problem of the $W_2$-Nearest Gaussian Distribution [4.042425236692822]
We study the problem of quantifying how far an empirical distribution deviates from Gaussianity under the framework of optimal transport.<n>By exploiting the cone geometry of the relative translation invariant quadratic Wasserstein space, we introduce two novel geometric quantities.<n>We prove that the filling cone generated by any two rays in this space is flat, ensuring angles, projections, and inner products are rigorously well-defined.
arXiv Detail & Related papers (2026-01-29T22:03:10Z) - VAE with Hyperspherical Coordinates: Improving Anomaly Detection from Hypervolume-Compressed Latent Space [56.362776482614976]
Variational autoencoders (VAE) encode data into lower-dimensional latent vectors before decoding those vectors back to data.<n>We propose to formulate the latent variables of a VAE using hyperspherical coordinates, which allows compressing the latent vectors towards a given direction on the hypersphere.<n>We show that this improves both the fully unsupervised and OOD anomaly detection ability of the VAE, achieving the best performance on the datasets we considered.
arXiv Detail & Related papers (2026-01-25T03:10:24Z) - Generative Learning of Densities on Manifolds [3.081704060720176]
A generative modeling framework is proposed that combines diffusion models and manifold learning.
The approach utilizes Diffusion Maps to uncover possible low-dimensional underlying (latent) spaces in the high-dimensional data (ambient) space.
arXiv Detail & Related papers (2025-03-05T23:29:06Z) - Expected Information Gain Estimation via Density Approximations: Sample Allocation and Dimension Reduction [0.40964539027092906]
We formulate flexible transport-based schemes for EIG estimation in general nonlinear/non-Gaussian settings.
We show that with this optimal sample allocation, the MSE of the resulting EIG estimator converges more quickly than that of a standard nested Monte Carlo scheme.
We then address the estimation of EIG in high dimensions, by deriving gradient-based upper bounds on the mutual information lost by projecting the parameters and/or observations to lower-dimensional subspaces.
arXiv Detail & Related papers (2024-11-13T07:22:50Z) - On Probabilistic Pullback Metrics on Latent Hyperbolic Manifolds [5.724027955589408]
This paper focuses on the hyperbolic manifold, a particularly suitable choice for modeling hierarchical relationships.
We propose augmenting the hyperbolic metric with a pullback metric to account for distortions introduced by theVM's nonlinear mapping.
Through various experiments, we demonstrate that geodesics on the pullback metric not only respect the geometry of the hyperbolic latent space but also align with the underlying data distribution.
arXiv Detail & Related papers (2024-10-28T09:13:00Z) - Uncertainty Visualization via Low-Dimensional Posterior Projections [23.371244861123827]
We introduce a new approach for estimating and visualizing posteriors by employing energy-based models (EBMs) over low-dimensional subspaces.
We demonstrate the effectiveness of our method across a diverse range of datasets and image restoration problems.
arXiv Detail & Related papers (2023-12-12T23:51:07Z) - Implicit Manifold Gaussian Process Regression [49.0787777751317]
Gaussian process regression is widely used to provide well-calibrated uncertainty estimates.
It struggles with high-dimensional data because of the implicit low-dimensional manifold upon which the data actually lies.
In this paper we propose a technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way.
arXiv Detail & Related papers (2023-10-30T09:52:48Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - Fast Approximation of the Sliced-Wasserstein Distance Using
Concentration of Random Projections [19.987683989865708]
The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications.
We propose a new perspective to approximate SW by making use of the concentration of measure phenomenon.
Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation.
arXiv Detail & Related papers (2021-06-29T13:56:19Z) - Instance-Optimal Compressed Sensing via Posterior Sampling [101.43899352984774]
We show for Gaussian measurements and emphany prior distribution on the signal, that the posterior sampling estimator achieves near-optimal recovery guarantees.
We implement the posterior sampling estimator for deep generative priors using Langevin dynamics, and empirically find that it produces accurate estimates with more diversity than MAP.
arXiv Detail & Related papers (2021-06-21T22:51:56Z) - Augmented Sliced Wasserstein Distances [55.028065567756066]
We propose a new family of distance metrics, called augmented sliced Wasserstein distances (ASWDs)
ASWDs are constructed by first mapping samples to higher-dimensional hypersurfaces parameterized by neural networks.
Numerical results demonstrate that the ASWD significantly outperforms other Wasserstein variants for both synthetic and real-world problems.
arXiv Detail & Related papers (2020-06-15T23:00:08Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.