Topolow: Force-Directed Euclidean Embedding of Dissimilarity Data with Robustness Against Non-Metricity and Sparsity
- URL: http://arxiv.org/abs/2508.01733v1
- Date: Sun, 03 Aug 2025 12:19:17 GMT
- Title: Topolow: Force-Directed Euclidean Embedding of Dissimilarity Data with Robustness Against Non-Metricity and Sparsity
- Authors: Omid Arhami, Pejman Rohani,
- Abstract summary: Topolow is a physics-inspired, gradient-free optimization framework for such embedding problems.<n>Topolow does not require the input dissimilarities to be metric, making it a robust solution for embedding non-metric measurements into a valid Euclidean space.<n>This paper formalizes the algorithm, first introduced as Topolow in the context of antigenic mapping in (Arhami and Rohani, 2025)
- Score: 0.8287206589886881
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The problem of embedding a set of objects into a low-dimensional Euclidean space based on a matrix of pairwise dissimilarities is fundamental in data analysis, machine learning, and statistics. However, the assumptions of many standard analytical methods are violated when the input dissimilarities fail to satisfy metric or Euclidean axioms. We present the mathematical and statistical foundations of Topolow, a physics-inspired, gradient-free optimization framework for such embedding problems. Topolow is conceptually related to force-directed graph drawing algorithms but is fundamentally distinguished by its goal of quantitative metric reconstruction. It models objects as particles in a physical system, and its novel optimization scheme proceeds through sequential, stochastic pairwise interactions, which circumvents the need to compute a global gradient and provides robustness against convergence to local optima, especially for sparse data. Topolow maximizes the likelihood under a Laplace error model, robust to outliers and heterogeneous errors, and properly handles censored data. Crucially, Topolow does not require the input dissimilarities to be metric, making it a robust solution for embedding non-metric measurements into a valid Euclidean space, thereby enabling the use of standard analytical tools. We demonstrate the superior performance of Topolow compared to standard Multidimensional Scaling (MDS) methods in reconstructing the geometry of sparse and non-Euclidean data. This paper formalizes the algorithm, first introduced as Topolow in the context of antigenic mapping in (Arhami and Rohani, 2025) (open access), with emphasis on its metric embedding and mathematical properties for a broader audience. The general-purpose function Euclidify is available in the R package topolow.
Related papers
- Finsler Multi-Dimensional Scaling: Manifold Learning for Asymmetric Dimensionality Reduction and Embedding [41.601022263772535]
Dimensionality reduction aims to simplify complex data by reducing its feature dimensionality while preserving essential patterns, with core applications in data analysis and visualisation.<n>To preserve the underlying data structure, multi-dimensional scaling (MDS) methods focus on preserving pairwise dissimilarities, such as distances.
arXiv Detail & Related papers (2025-03-23T10:03:22Z) - Dimension reduction via score ratio matching [0.9012198585960441]
We propose a framework, derived from score-matching, to extend gradient-based dimension reduction to problems where gradients are unavailable.<n>We show that our approach outperforms standard score-matching for problems with low-dimensional structure.
arXiv Detail & Related papers (2024-10-25T22:21:03Z) - Sample-Efficient Geometry Reconstruction from Euclidean Distances using Non-Convex Optimization [7.114174944371803]
The problem of finding suitable point embedding Euclidean distance information point pairs arises both as a core task and as a sub-machine learning learning problem.
In this paper, we aim to solve this problem given a minimal number of samples.
arXiv Detail & Related papers (2024-10-22T13:02:12Z) - Probabilistic Iterative Hard Thresholding for Sparse Learning [2.5782973781085383]
"l0 norm" counts the number of non-zero components in a vector.<n>In big data settings wherein noisy estimates of the gradient must be evaluated out of computational necessity, the literature is scant on methods that reliably converge.<n>We prove convergence of the underlying process, and demonstrate the performance on Machine Learning problems.
arXiv Detail & Related papers (2024-09-02T18:14:45Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - An evaluation framework for dimensionality reduction through sectional
curvature [59.40521061783166]
In this work, we aim to introduce the first highly non-supervised dimensionality reduction performance metric.
To test its feasibility, this metric has been used to evaluate the performance of the most commonly used dimension reduction algorithms.
A new parameterized problem instance generator has been constructed in the form of a function generator.
arXiv Detail & Related papers (2023-03-17T11:59:33Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Improving Metric Dimensionality Reduction with Distributed Topology [68.8204255655161]
DIPOLE is a dimensionality-reduction post-processing step that corrects an initial embedding by minimizing a loss functional with both a local, metric term and a global, topological term.
We observe that DIPOLE outperforms popular methods like UMAP, t-SNE, and Isomap on a number of popular datasets.
arXiv Detail & Related papers (2021-06-14T17:19:44Z) - Jointly Modeling and Clustering Tensors in High Dimensions [6.072664839782975]
We consider the problem of jointly benchmarking and clustering of tensors.
We propose an efficient high-maximization algorithm that converges geometrically to a neighborhood that is within statistical precision.
arXiv Detail & Related papers (2021-04-15T21:06:16Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Semiparametric Nonlinear Bipartite Graph Representation Learning with
Provable Guarantees [106.91654068632882]
We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution.
We show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate.
Our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.
arXiv Detail & Related papers (2020-03-02T16:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.