Preserving local densities in low-dimensional embeddings
- URL: http://arxiv.org/abs/2301.13732v1
- Date: Tue, 31 Jan 2023 16:11:54 GMT
- Title: Preserving local densities in low-dimensional embeddings
- Authors: Jonas Fischer, Rebekka Burkholz, Jilles Vreeken
- Abstract summary: State-of-the-art methods, such as tSNE and UMAP, excel in unveiling local structures hidden in high-dimensional data.
We show, however, that these methods fail to reconstruct local properties, such as relative differences in densities.
We suggest dtSNE, which approximately conserves local densities.
- Score: 37.278617643507815
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Low-dimensional embeddings and visualizations are an indispensable tool for
analysis of high-dimensional data. State-of-the-art methods, such as tSNE and
UMAP, excel in unveiling local structures hidden in high-dimensional data and
are therefore routinely applied in standard analysis pipelines in biology. We
show, however, that these methods fail to reconstruct local properties, such as
relative differences in densities (Fig. 1) and that apparent differences in
cluster size can arise from computational artifact caused by differing sample
sizes (Fig. 2). Providing a theoretical analysis of this issue, we then suggest
dtSNE, which approximately conserves local densities. In an extensive study on
synthetic benchmark and real world data comparing against five state-of-the-art
methods, we empirically show that dtSNE provides similar global reconstruction,
but yields much more accurate depictions of local distances and relative
densities.
Related papers
- IsUMap: Manifold Learning and Data Visualization leveraging Vietoris-Rips filtrations [0.08796261172196743]
We present a systematic and detailed construction of a metric representation for locally distorted metric spaces.
Our approach addresses limitations in existing methods by accommodating non-uniform data distributions and intricate local geometries.
arXiv Detail & Related papers (2024-07-25T07:46:30Z) - Sailing in high-dimensional spaces: Low-dimensional embeddings through angle preservation [12.775562063735006]
Low-dimensional embeddings (LDEs) of high-dimensional data are ubiquitous in science and engineering.
We suggest a new perspective on LDE learning, reconstructing angles between data points.
arXiv Detail & Related papers (2024-06-14T09:44:06Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Dimensionality-Aware Outlier Detection: Theoretical and Experimental Analysis [9.962838991341874]
We present a nonparametric method for outlier detection that takes full account of local variations in dimensionality within the dataset.
We show that significantly outperforms three popular and important benchmark outlier detection methods.
arXiv Detail & Related papers (2024-01-10T01:07:35Z) - A Heat Diffusion Perspective on Geodesic Preserving Dimensionality
Reduction [66.21060114843202]
We propose a more general heat kernel based manifold embedding method that we call heat geodesic embeddings.
Results show that our method outperforms existing state of the art in preserving ground truth manifold distances.
We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure.
arXiv Detail & Related papers (2023-05-30T13:58:50Z) - Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z) - Incorporating Texture Information into Dimensionality Reduction for
High-Dimensional Images [65.74185962364211]
We present a method for incorporating neighborhood information into distance-based dimensionality reduction methods.
Based on a classification of different methods for comparing image patches, we explore a number of different approaches.
arXiv Detail & Related papers (2022-02-18T13:17:43Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Joint Characterization of Multiscale Information in High Dimensional
Data [0.0]
We propose a multiscale joint characterization approach designed to exploit synergies between global and local approaches to dimensionality reduction.
We show that joint characterization is capable of detecting and isolating signals which are not evident from either PCA or t-sne alone.
arXiv Detail & Related papers (2021-02-18T23:33:00Z) - LDLE: Low Distortion Local Eigenmaps [77.02534963571597]
We present Low Distortion Local Eigenmaps (LDLE), a manifold learning technique which constructs a set of low distortion local views of a dataset in lower dimension and registers them to obtain a global embedding.
The local views are constructed using the global eigenvectors of the graph Laplacian and are registered using Procrustes analysis.
arXiv Detail & Related papers (2021-01-26T19:55:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.