Related papers: Magnitude Distance: A Geometric Measure of Dataset Similarity

Magnitude Distance: A Geometric Measure of Dataset Similarity

URL: http://arxiv.org/abs/2602.08859v1
Date: Mon, 09 Feb 2026 16:23:43 GMT
Title: Magnitude Distance: A Geometric Measure of Dataset Similarity
Authors: Sahel Torkamani, Henry Gouk, Rik Sarkar,
Abstract summary: We propose textitmagnitude distance, a novel distance metric on finite datasets.<n>We prove several theoretical properties of magnitude distance, including its limiting behavior across scales.<n>We show that magnitude distance remains discriminative in high-dimensional settings when the scale is appropriately tuned.
Score: 9.19444526847653
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantifying the distance between datasets is a fundamental question in mathematics and machine learning. We propose \textit{magnitude distance}, a novel distance metric defined on finite datasets using the notion of the \emph{magnitude} of a metric space. The proposed distance incorporates a tunable scaling parameter, $t$, that controls the sensitivity to global structure (small $t$) and finer details (large $t$). We prove several theoretical properties of magnitude distance, including its limiting behavior across scales and conditions under which it satisfies key metric properties. In contrast to classical distances, we show that magnitude distance remains discriminative in high-dimensional settings when the scale is appropriately tuned. We further demonstrate how magnitude distance can be used as a training objective for push-forward generative models. Our experimental results support our theoretical analysis and demonstrate that magnitude distance provides meaningful signals, comparable to established distance-based generative approaches.

Related papers

Comparing Labeled Markov Chains: A Cantor-Kantorovich Approach [53.66196601631798]
We study the recently introduced Cantor-Kantorovich (or CK) distance.<n>In particular we show that the latter can be framed as a discounted sum of finite-horizon Total Variation distances.<n>More precisely, we show that the exact computation of the CK distance is #P-hard.
arXiv Detail & Related papers (2025-11-22T16:02:56Z)
Geometry-aware Distance Measure for Diverse Hierarchical Structures in Hyperbolic Spaces [48.948334221681684]
We propose a geometry-aware distance measure in hyperbolic spaces, which dynamically adapts to varying hierarchical structures.<n>Our approach consistently outperforms learning methods that use fixed distance measures.<n>Visualization shows clearer class boundaries and improved prototype separation in hyperbolic spaces.
arXiv Detail & Related papers (2025-06-23T11:43:39Z)
A Set-to-Set Distance Measure in Hyperbolic Space [50.134086375286074]
We propose a hyperbolic set-to-set distance measure for computing dissimilarity between sets in hyperbolic space.<n>By considering the topological differences, HS2SD provides a more nuanced understanding of the relationships between two hyperbolic sets.
arXiv Detail & Related papers (2025-06-23T11:31:40Z)
Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models [63.331590876872944]
We propose a method for deriving Riemannian metrics directly from pretrained Energy-Based Models.<n>These metrics define spatially varying distances, enabling the computation of geodesics.<n>We show that EBM-derived metrics consistently outperform established baselines.
arXiv Detail & Related papers (2025-05-23T12:18:08Z)
GeoMM: On Geodesic Perspective for Multi-modal Learning [55.41612200877861]
This paper introduces geodesic distance as a novel distance metric in multi-modal learning for the first time.<n>Our approach incorporates a comprehensive series of strategies to adapt geodesic distance for the current multimodal learning.
arXiv Detail & Related papers (2025-05-16T13:12:41Z)
Feature Subset Weighting for Distance-based Supervised Learning through Choquet Integration [2.1943338072179444]
This paper introduces feature subset weighting using monotone measures for distance-based supervised learning.<n>The Choquet integral is used to define a distance metric that incorporates these weights.<n>We show how this approach ensures that the distances remain unaffected by the addition of duplicate and strongly correlated features.
arXiv Detail & Related papers (2025-04-01T10:23:01Z)
Manifold learning in metric spaces [2.005299372367689]
Laplacian-based methods are popular for the dimensionality reduction of data lying in $mathbbRN$.<n>We provide a framework that generalizes the problem of manifold learning to metric spaces and study when a metric satisfies sufficient conditions for the pointwise convergence of the graph Laplacian.
arXiv Detail & Related papers (2025-03-20T14:37:40Z)
Learning Distances from Data with Normalizing Flows and Score Matching [9.605001452209867]
Density-based distances (DBDs) provide a principled approach to metric learning by defining distances in terms of the underlying data distribution.<n>We introduce a dimension-adapted Fermat distance that scales intuitively to high dimensions and improves numerical stability.
arXiv Detail & Related papers (2024-07-12T14:30:41Z)
A general framework for implementing distances for categorical variables [0.0]
We introduce a general framework that allows for an efficient and transparent implementation of distances between observations on categorical variables. Our framework quite naturally leads to the introduction of new distance formulations and allows for the implementation of flexible, case and data specific distance definitions. In a supervised classification setting, the framework can be used to construct distances that incorporate the association between the response and predictor variables.
arXiv Detail & Related papers (2023-01-04T13:50:08Z)
Identifying latent distances with Finslerian geometry [6.0188611984807245]
Generative models cause the data space and the geodesics to be at best impractical, and at worst impossible to manipulate. In this work, we propose another metric whose geodesics explicitly minimise the expected length of the pullback metric. In high dimensions, we prove that both metrics converge to each other at a rate of $Oleft(frac1Dright)$.
arXiv Detail & Related papers (2022-12-20T05:57:27Z)
Distance Metric Learning through Minimization of the Free Energy [0.825845106786193]
We present a simple approach based on concepts from statistical physics to learn optimal distance metric for a given problem. Much like for many problems in physics, we propose an approach based on Metropolis Monte Carlo to find the best distance metric. Our proposed method can handle a wide variety of constraints including those with spurious local minima.
arXiv Detail & Related papers (2021-06-10T04:54:25Z)
Geometry of Similarity Comparisons [51.552779977889045]
We show that the ordinal capacity of a space form is related to its dimension and the sign of its curvature. More importantly, we show that the statistical behavior of the ordinal spread random variables defined on a similarity graph can be used to identify its underlying space form.
arXiv Detail & Related papers (2020-06-17T13:37:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.