Improving Metric Dimensionality Reduction with Distributed Topology
- URL: http://arxiv.org/abs/2106.07613v1
- Date: Mon, 14 Jun 2021 17:19:44 GMT
- Title: Improving Metric Dimensionality Reduction with Distributed Topology
- Authors: Alexander Wagner, Elchanan Solomon, Paul Bendich
- Abstract summary: DIPOLE is a dimensionality-reduction post-processing step that corrects an initial embedding by minimizing a loss functional with both a local, metric term and a global, topological term.
We observe that DIPOLE outperforms popular methods like UMAP, t-SNE, and Isomap on a number of popular datasets.
- Score: 68.8204255655161
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We propose a novel approach to dimensionality reduction combining techniques
of metric geometry and distributed persistent homology, in the form of a
gradient-descent based method called DIPOLE. DIPOLE is a
dimensionality-reduction post-processing step that corrects an initial
embedding by minimizing a loss functional with both a local, metric term and a
global, topological term. By fixing an initial embedding method (we use
Isomap), DIPOLE can also be viewed as a full dimensionality-reduction pipeline.
This framework is based on the strong theoretical and computational properties
of distributed persistent homology and comes with the guarantee of almost sure
convergence. We observe that DIPOLE outperforms popular methods like UMAP,
t-SNE, and Isomap on a number of popular datasets, both visually and in terms
of precise quantitative metrics.
Related papers
- A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - A Heat Diffusion Perspective on Geodesic Preserving Dimensionality
Reduction [66.21060114843202]
We propose a more general heat kernel based manifold embedding method that we call heat geodesic embeddings.
Results show that our method outperforms existing state of the art in preserving ground truth manifold distances.
We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure.
arXiv Detail & Related papers (2023-05-30T13:58:50Z) - Learning Topology-Preserving Data Representations [9.710409273484464]
We propose a method for learning topology-preserving data representations (dimensionality reduction)
The core of the method is the minimization of the Representation Topology Divergence (RTD) between original high-dimensional data and low-dimensional representation in latent space.
The proposed method better preserves the global structure and topology of the data manifold than state-of-the-art competitors as measured by linear correlation, triplet distance ranking accuracy, and Wasserstein distance between persistence barcodes.
arXiv Detail & Related papers (2023-01-31T22:55:04Z) - Linear Convergence of Natural Policy Gradient Methods with Log-Linear
Policies [115.86431674214282]
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.
We show that both methods attain linear convergence rates and $mathcalO (1/epsilon2)$ sample complexities using a simple, non-adaptive geometrically increasing step size.
arXiv Detail & Related papers (2022-10-04T06:17:52Z) - Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z) - Manifold learning-based polynomial chaos expansions for high-dimensional
surrogate models [0.0]
We introduce a manifold learning-based method for uncertainty quantification (UQ) in describing systems.
The proposed method is able to achieve highly accurate approximations which ultimately lead to the significant acceleration of UQ tasks.
arXiv Detail & Related papers (2021-07-21T00:24:15Z) - Geometric variational inference [0.0]
Variational Inference (VI) or Markov-Chain Monte-Carlo (MCMC) techniques are used to go beyond point estimates.
This work proposes geometric Variational Inference (geoVI), a method based on Riemannian geometry and the Fisher information metric.
The distribution, expressed in the coordinate system induced by the transformation, takes a particularly simple form that allows for an accurate variational approximation.
arXiv Detail & Related papers (2021-05-21T17:18:50Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - Non-Convex Exact Community Recovery in Stochastic Block Model [31.221745716673546]
Community detection in graphs that are generated according to symmetric block models (SBMs) has received much attention lately.
We show that in the logarithmic sparsity regime of the problem, with high probability the proposed two-stage method can exactly recover the two communities down to the information-theoretic limit in $mathcalO(nlog2n/loglog n)$ time.
We also conduct numerical experiments on both synthetic and real data sets to demonstrate the efficacy of our proposed method and complement our theoretical development.
arXiv Detail & Related papers (2020-06-29T07:03:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.