Information-Theoretic Quality Metric of Low-Dimensional Embeddings
- URL: http://arxiv.org/abs/2512.23981v2
- Date: Thu, 01 Jan 2026 02:04:56 GMT
- Title: Information-Theoretic Quality Metric of Low-Dimensional Embeddings
- Authors: Sebastián Gutiérrez-Bernal, Hector Medel Cobaxin, Abiel Galindo González,
- Abstract summary: We study the quality of low-dimensional embeddings from an explicitly information-theoretic perspective.<n>We introduce the Entropy Rank Preservation Measure (ERPM), a local metric based on the Shannon entropy of the singular-value spectrum of neighborhood matrices.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we study the quality of low-dimensional embeddings from an explicitly information-theoretic perspective. We begin by noting that classical evaluation metrics such as stress, rank-based neighborhood criteria, or Local Procrustes quantify distortions in distances or in local geometries, but do not directly assess how much information is preserved when projecting high-dimensional data onto a lower-dimensional space. To address this limitation, we introduce the Entropy Rank Preservation Measure (ERPM), a local metric based on the Shannon entropy of the singular-value spectrum of neighborhood matrices and on the stable rank, which quantifies changes in uncertainty between the original representation and its reduced projection, providing neighborhood-level indicators and a global summary statistic. To validate the results of the metric, we compare its outcomes with the Mean Relative Rank Error (MRRE), which is distance-based, and with Local Procrustes, which is based on geometric properties, using a financial time series and a manifold commonly studied in the literature. We observe that distance-based criteria exhibit very low correlation with geometric and spectral measures, while ERPM and Local Procrustes show strong average correlation but display significant discrepancies in local regimes, leading to the conclusion that ERPM complements existing metrics by identifying neighborhoods with severe information loss, thereby enabling a more comprehensive assessment of embeddings, particularly in information-sensitive applications such as the construction of early-warning indicators.
Related papers
- GICDM: Mitigating Hubness for Reliable Distance-Based Generative Model Evaluation [30.08046476442414]
Generative model evaluation commonly relies on high-dimensional embedding spaces to compute distances between samples.<n>We show that dataset representations in these spaces are affected by the hubness phenomenon.<n>We introduce Generative ICDM, a method to correct neighborhood estimation for both real and generated data.
arXiv Detail & Related papers (2026-02-18T13:33:54Z) - MS-ISSM: Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity [65.85858856481131]
unstructured and irregular nature of point clouds poses a significant challenge for objective quality assessment (PCQA)<n>We propose the Multi-scale Implicit Structural Similarity Measurement (MS-ISSM)
arXiv Detail & Related papers (2026-01-03T14:58:52Z) - TOPO-Bench: An Open-Source Topological Mapping Evaluation Framework with Quantifiable Perceptual Aliasing [10.736029638634504]
We formalize topological consistency as the fundamental property of topological maps and show that localization accuracy provides an efficient surrogate metric.<n>We propose the first quantitative measure of dataset ambiguity to enable fair comparisons across environments.<n>All datasets, baselines, and evaluation tools are fully open-sourced to foster consistent and reproducible research in topological mapping.
arXiv Detail & Related papers (2025-10-05T08:58:08Z) - A Novel Distance-Based Metric for Quality Assessment in Image Segmentation [0.7673339435080445]
We introduce the Surface Consistency Coefficient ( SCC), a novel distance-based quality metric.<n> SCC quantifies the spatial distribution of errors based on their proximity to the surface of the structure.<n>We demonstrate the robustness and effectiveness of SCC in distinguishing errors near the surface from those further away.
arXiv Detail & Related papers (2025-03-28T12:02:09Z) - Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Adversarial Estimation of Topological Dimension with Harmonic Score Maps [7.34158170612151]
We show that it is possible to retrieve the topological dimension of the manifold learned by the score map.
We then introduce a novel method to measure the learned manifold's topological dimension using adversarial attacks.
arXiv Detail & Related papers (2023-12-11T22:29:54Z) - SALC: Skeleton-Assisted Learning-Based Clustering for Time-Varying
Indoor Localization [3.9373541926236757]
We propose a skeleton-assisted learning-based clustering localization system, including RSS-oriented map-assisted clustering (ROMAC) and cluster-scaled location estimation (CsLE)
Both simulation and experimental results demonstrate that the proposed SALC system can effectively reconstruct the fingerprint database with an enhanced location estimation accuracy.
arXiv Detail & Related papers (2023-07-14T22:55:52Z) - Joint Metrics Matter: A Better Standard for Trajectory Forecasting [67.1375677218281]
Multi-modal trajectory forecasting methods evaluate using single-agent metrics (marginal metrics)
Only focusing on marginal metrics can lead to unnatural predictions, such as colliding trajectories or diverging trajectories for people who are clearly walking together as a group.
We present the first comprehensive evaluation of state-of-the-art trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate.
arXiv Detail & Related papers (2023-05-10T16:27:55Z) - Preserving local densities in low-dimensional embeddings [37.278617643507815]
State-of-the-art methods, such as tSNE and UMAP, excel in unveiling local structures hidden in high-dimensional data.
We show, however, that these methods fail to reconstruct local properties, such as relative differences in densities.
We suggest dtSNE, which approximately conserves local densities.
arXiv Detail & Related papers (2023-01-31T16:11:54Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Improving Metric Dimensionality Reduction with Distributed Topology [68.8204255655161]
DIPOLE is a dimensionality-reduction post-processing step that corrects an initial embedding by minimizing a loss functional with both a local, metric term and a global, topological term.
We observe that DIPOLE outperforms popular methods like UMAP, t-SNE, and Isomap on a number of popular datasets.
arXiv Detail & Related papers (2021-06-14T17:19:44Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.