Intrinsic Dimension for Large-Scale Geometric Learning
- URL: http://arxiv.org/abs/2210.05301v2
- Date: Mon, 17 Apr 2023 11:08:46 GMT
- Title: Intrinsic Dimension for Large-Scale Geometric Learning
- Authors: Maximilian Stubbemann, Tom Hanika, Friedrich Martin Schneider
- Abstract summary: A naive approach to determine the dimension of a dataset is based on the number of attributes.
More sophisticated methods derive a notion of intrinsic dimension (ID) that employs more complex feature functions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The concept of dimension is essential to grasp the complexity of data. A
naive approach to determine the dimension of a dataset is based on the number
of attributes. More sophisticated methods derive a notion of intrinsic
dimension (ID) that employs more complex feature functions, e.g., distances
between data points. Yet, many of these approaches are based on empirical
observations, cannot cope with the geometric character of contemporary
datasets, and do lack an axiomatic foundation. A different approach was
proposed by V. Pestov, who links the intrinsic dimension axiomatically to the
mathematical concentration of measure phenomenon. First methods to compute this
and related notions for ID were computationally intractable for large-scale
real-world datasets. In the present work, we derive a computationally feasible
method for determining said axiomatic ID functions. Moreover, we demonstrate
how the geometric properties of complex data are accounted for in our modeling.
In particular, we propose a principle way to incorporate neighborhood
information, as in graph data, into the ID. This allows for new insights into
common graph learning procedures, which we illustrate by experiments on the
Open Graph Benchmark.
Related papers
- Score-based pullback Riemannian geometry [10.649159213723106]
We propose a framework for data-driven Riemannian geometry that is scalable in both geometry and learning.
We produce high-quality geodesics through the data support and reliably estimates the intrinsic dimension of the data manifold.
Our framework can naturally be used with anisotropic normalizing flows by adopting isometry regularization during training.
arXiv Detail & Related papers (2024-10-02T18:52:12Z) - (Deep) Generative Geodesics [57.635187092922976]
We introduce a newian metric to assess the similarity between any two data points.
Our metric leads to the conceptual definition of generative distances and generative geodesics.
Their approximations are proven to converge to their true values under mild conditions.
arXiv Detail & Related papers (2024-07-15T21:14:02Z) - Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - Towards a mathematical understanding of learning from few examples with
nonlinear feature maps [68.8204255655161]
We consider the problem of data classification where the training set consists of just a few data points.
We reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities.
arXiv Detail & Related papers (2022-11-07T14:52:58Z) - A geometric framework for outlier detection in high-dimensional data [0.0]
Outlier or anomaly detection is an important task in data analysis.
We provide a framework that exploits the metric structure of a data set.
We show that exploiting this structure significantly improves the detection of outlying observations in high-dimensional data.
arXiv Detail & Related papers (2022-07-01T12:07:51Z) - Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation.
The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z) - Optimal radial basis for density-based atomic representations [58.720142291102135]
We discuss how to build an adaptive, optimal numerical basis that is chosen to represent most efficiently the structural diversity of the dataset at hand.
For each training dataset, this optimal basis is unique, and can be computed at no additional cost with respect to the primitive basis.
We demonstrate that this construction yields representations that are accurate and computationally efficient.
arXiv Detail & Related papers (2021-05-18T17:57:08Z) - Hermitian Symmetric Spaces for Graph Embeddings [0.0]
We learn continuous representations of graphs in spaces of symmetric matrices over C.
These spaces offer a rich geometry that simultaneously admits hyperbolic and Euclidean subspaces.
The proposed models are able to automatically adapt to very dissimilar arrangements without any apriori estimates of graph features.
arXiv Detail & Related papers (2021-05-11T18:14:52Z) - Bayesian Quadrature on Riemannian Data Manifolds [79.71142807798284]
A principled way to model nonlinear geometric structure inherent in data is provided.
However, these operations are typically computationally demanding.
In particular, we focus on Bayesian quadrature (BQ) to numerically compute integrals over normal laws.
We show that by leveraging both prior knowledge and an active exploration scheme, BQ significantly reduces the number of required evaluations.
arXiv Detail & Related papers (2021-02-12T17:38:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.