How high is `high'? Rethinking the roles of dimensionality in topological data analysis and manifold learning
- URL: http://arxiv.org/abs/2505.16879v1
- Date: Thu, 22 May 2025 16:34:15 GMT
- Title: How high is `high'? Rethinking the roles of dimensionality in topological data analysis and manifold learning
- Authors: Hannah Sansford, Nick Whiteley, Patrick Rubin-Delanchy,
- Abstract summary: We present a generalised Hanson-Wright inequality and use it to establish new statistical insights into the geometry of data point-clouds.<n>We revisit the ground-breaking neuroscience discovery of isometric toroidal structure in grid-cell activity made by Gardner et al.<n>Our findings reveal, for the first time, evidence that this structure is in fact to physical space, meaning that grid cell activity conveys a geometrically faithful representation of the real world.
- Score: 8.397730500554047
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a generalised Hanson-Wright inequality and use it to establish new statistical insights into the geometry of data point-clouds. In the setting of a general random function model of data, we clarify the roles played by three notions of dimensionality: ambient intrinsic dimension $p_{\mathrm{int}}$, which measures total variability across orthogonal feature directions; correlation rank, which measures functional complexity across samples; and latent intrinsic dimension, which is the dimension of manifold structure hidden in data. Our analysis shows that in order for persistence diagrams to reveal latent homology and for manifold structure to emerge it is sufficient that $p_{\mathrm{int}}\gg \log n$, where $n$ is the sample size. Informed by these theoretical perspectives, we revisit the ground-breaking neuroscience discovery of toroidal structure in grid-cell activity made by Gardner et al. (Nature, 2022): our findings reveal, for the first time, evidence that this structure is in fact isometric to physical space, meaning that grid cell activity conveys a geometrically faithful representation of the real world.
Related papers
- (Deep) Generative Geodesics [57.635187092922976]
We introduce a newian metric to assess the similarity between any two data points.
Our metric leads to the conceptual definition of generative distances and generative geodesics.
Their approximations are proven to converge to their true values under mild conditions.
arXiv Detail & Related papers (2024-07-15T21:14:02Z) - Information-Theoretic Thresholds for Planted Dense Cycles [52.076657911275525]
We study a random graph model for small-world networks which are ubiquitous in social and biological sciences.
For both detection and recovery of the planted dense cycle, we characterize the information-theoretic thresholds in terms of $n$, $tau$, and an edge-wise signal-to-noise ratio $lambda$.
arXiv Detail & Related papers (2024-02-01T03:39:01Z) - A Hitchhiker's Guide to Geometric GNNs for 3D Atomic Systems [87.30652640973317]
Recent advances in computational modelling of atomic systems represent them as geometric graphs with atoms embedded as nodes in 3D Euclidean space.
Geometric Graph Neural Networks have emerged as the preferred machine learning architecture powering applications ranging from protein structure prediction to molecular simulations and material generation.
This paper provides a comprehensive and self-contained overview of the field of Geometric GNNs for 3D atomic systems.
arXiv Detail & Related papers (2023-12-12T18:44:19Z) - Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - Study of Manifold Geometry using Multiscale Non-Negative Kernel Graphs [32.40622753355266]
We propose a framework to study the geometric structure of the data.
We make use of our recently introduced non-negative kernel (NNK) regression graphs to estimate the point density, intrinsic dimension, and the linearity of the data manifold (curvature)
arXiv Detail & Related papers (2022-10-31T17:01:17Z) - Shape And Structure Preserving Differential Privacy [70.08490462870144]
We show how the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism.
We also show how using the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism.
arXiv Detail & Related papers (2022-09-21T18:14:38Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - Statistical Mechanics of Neural Processing of Object Manifolds [3.4809730725241605]
This thesis lays the groundwork for a computational theory of neuronal processing of objects.
We identify that the capacity of a manifold is determined that effective radius, R_M, and effective dimension, D_M.
arXiv Detail & Related papers (2021-06-01T20:49:14Z) - Identifying the latent space geometry of network models through analysis
of curvature [7.644165047073435]
We present a method to consistently estimate the manifold type, dimension, and curvature from an empirically relevant class of latent spaces.
Our core insight comes by representing the graph as a noisy distance matrix based on the ties between cliques.
arXiv Detail & Related papers (2020-12-19T00:35:29Z) - Visualizing the Finer Cluster Structure of Large-Scale and
High-Dimensional Data [7.400745342582259]
We propose using a generalized sigmoid function to model the distance similarity in both high- and low-dimensional spaces.
Using both simulated and real-world data sets, we show that our proposed method can generate visualization results comparable to those of uniform manifold approximation and projection.
arXiv Detail & Related papers (2020-07-17T01:36:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.