Understanding and Improving UMAP with Geometric and Topological Priors: The JORC-UMAP Algorithm
- URL: http://arxiv.org/abs/2601.16552v1
- Date: Fri, 23 Jan 2026 08:42:56 GMT
- Title: Understanding and Improving UMAP with Geometric and Topological Priors: The JORC-UMAP Algorithm
- Authors: Xiaobin Li, Run Zhang,
- Abstract summary: dimensionality reduction techniques, particularly UMAP, are widely used for visualizing high-dimensional data.<n>We introduce Ollivier-Ricci curvature as a geometric prior, reinforcing edges at geometric bottlenecks and reducing redundant links.<n>Experiments on synthetic and real-world datasets show that JORC-UMAP reduces tearing and collapse more effectively than standard UMAP and other DR methods.
- Score: 1.7484982792736636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nonlinear dimensionality reduction techniques, particularly UMAP, are widely used for visualizing high-dimensional data. However, UMAP's local Euclidean distance assumption often fails to capture intrinsic manifold geometry, leading to topological tearing and structural collapse. We identify UMAP's sensitivity to the k-nearest neighbor graph as a key cause. To address this, we introduce Ollivier-Ricci curvature as a geometric prior, reinforcing edges at geometric bottlenecks and reducing redundant links. Since curvature estimation is noise-sensitive, we also incorporate a topological prior using Jaccard similarity to ensure neighborhood consistency. The resulting method, JORC-UMAP, better distinguishes true manifold structure from spurious connections. Experiments on synthetic and real-world datasets show that JORC-UMAP reduces tearing and collapse more effectively than standard UMAP and other DR methods, as measured by SVM accuracy and triplet preservation scores, while maintaining computational efficiency. This work offers a geometry-aware enhancement to UMAP for more faithful data visualization.
Related papers
- Bayesian Matrix Completion Under Geometric Constraints [3.5522446024799064]
The completion of a Euclidean distance matrix from sparse and noisy observations is a fundamental challenge in signal processing.<n>Traditional approaches, such as rank-constrained optimization and semidefinite programming, enforce geometric constraints but often struggle under sparse or noisy conditions.<n>This paper introduces a hierarchical Bayesian framework that places structured priors directly on the latent point set generating the EDM, naturally embedding geometric constraints.
arXiv Detail & Related papers (2026-01-30T09:45:34Z) - Variational Geometric Information Bottleneck: Learning the Shape of Understanding [0.0]
Variational Geometric Information Bottleneck (V-GIB) is a variational estimator that unifies mutual-information compression and curvature regularization.<n>V-GIB provides a principled and measurable route to representations that are geometrically coherent, data-efficient, and aligned with human-understandable structure.
arXiv Detail & Related papers (2025-11-04T11:33:54Z) - EmbedOR: Provable Cluster-Preserving Visualizations with Curvature-Based Stochastic Neighbor Embeddings [18.64124104660797]
Neighbor Embedding (SNE) algorithms like UMAP and tSNE often produce visualizations that do not preserve the geometry of noisy and high dimensional data.<n>We propose EmbedOR, a SNE algorithm that incorporates discrete graph curvature.<n>Our algorithmally embeds the data using a curvature-enhanced distance metric that emphasizes underlying cluster structure.
arXiv Detail & Related papers (2025-09-03T20:38:39Z) - Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models [63.331590876872944]
We propose a method for deriving Riemannian metrics directly from pretrained Energy-Based Models.<n>These metrics define spatially varying distances, enabling the computation of geodesics.<n>We show that EBM-derived metrics consistently outperform established baselines.
arXiv Detail & Related papers (2025-05-23T12:18:08Z) - Disentangled Representation Learning with the Gromov-Monge Gap [65.73194652234848]
Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning.<n>We introduce a novel approach to disentangled representation learning based on quadratic optimal transport.<n>We demonstrate the effectiveness of our approach for quantifying disentanglement across four standard benchmarks.
arXiv Detail & Related papers (2024-07-10T16:51:32Z) - Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation [52.73824786627612]
This paper establishes new convergence results for textitgeodesic strongly monotone games.<n>Our key result shows that RGD attains last-iterate linear convergence in a textitgeometry-agnostic fashion.<n>Overall, this paper presents the first geometry-agnostic last-iterate convergence analysis for games beyond the Euclidean settings.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Exploring Data Geometry for Continual Learning [64.4358878435983]
We study continual learning from a novel perspective by exploring data geometry for the non-stationary stream of data.
Our method dynamically expands the geometry of the underlying space to match growing geometric structures induced by new data.
Experiments show that our method achieves better performance than baseline methods designed in Euclidean space.
arXiv Detail & Related papers (2023-04-08T06:35:25Z) - Learning Topology-Preserving Data Representations [9.710409273484464]
We propose a method for learning topology-preserving data representations (dimensionality reduction)
The core of the method is the minimization of the Representation Topology Divergence (RTD) between original high-dimensional data and low-dimensional representation in latent space.
The proposed method better preserves the global structure and topology of the data manifold than state-of-the-art competitors as measured by linear correlation, triplet distance ranking accuracy, and Wasserstein distance between persistence barcodes.
arXiv Detail & Related papers (2023-01-31T22:55:04Z) - Clustering with UMAP: Why and How Connectivity Matters [3.04585143845864]
Topology based dimensionality reduction methods such as t-SNE and UMAP have seen increasing success and popularity in high-dimensional data.
We study the effects of node connectivity (k-Nearest Neighbors vs textitmutual k-Nearest Neighbors) and relative neighborhood (Adjacent via Path Neighbors) on dimensionality reduction.
arXiv Detail & Related papers (2021-08-12T04:25:03Z) - Self-supervised Geometric Perception [96.89966337518854]
Self-supervised geometric perception is a framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels.
We show that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
arXiv Detail & Related papers (2021-03-04T15:34:43Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - Learning Flat Latent Manifolds with VAEs [16.725880610265378]
We propose an extension to the framework of variational auto-encoders, where the Euclidean metric is a proxy for the similarity between data points.
We replace the compact prior typically used in variational auto-encoders with a recently presented, more expressive hierarchical one.
We evaluate our method on a range of data-sets, including a video-tracking benchmark.
arXiv Detail & Related papers (2020-02-12T09:54:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.