ABID: Angle Based Intrinsic Dimensionality
- URL: http://arxiv.org/abs/2006.12880v1
- Date: Tue, 23 Jun 2020 10:19:34 GMT
- Title: ABID: Angle Based Intrinsic Dimensionality
- Authors: Erik Thordsen and Erich Schubert
- Abstract summary: The intrinsic dimensionality refers to the true'' dimensionality of the data, as opposed to the dimensionality of the data representation.
Most popular methods for estimating the local intrinsic dimensionality are based on distances.
We derive the theoretical distribution of angles and use this to construct an estimator for intrinsic dimensionality.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The intrinsic dimensionality refers to the ``true'' dimensionality of the
data, as opposed to the dimensionality of the data representation. For example,
when attributes are highly correlated, the intrinsic dimensionality can be much
lower than the number of variables. Local intrinsic dimensionality refers to
the observation that this property can vary for different parts of the data
set; and intrinsic dimensionality can serve as a proxy for the local difficulty
of the data set.
Most popular methods for estimating the local intrinsic dimensionality are
based on distances, and the rate at which the distances to the nearest
neighbors increase, a concept known as ``expansion dimension''. In this paper
we introduce an orthogonal concept, which does not use any distances: we use
the distribution of angles between neighbor points. We derive the theoretical
distribution of angles and use this to construct an estimator for intrinsic
dimensionality.
Experimentally, we verify that this measure behaves similarly, but
complementarily, to existing measures of intrinsic dimensionality. By
introducing a new idea of intrinsic dimensionality to the research community,
we hope to contribute to a better understanding of intrinsic dimensionality and
to spur new research in this direction.
Related papers
- OPDR: Order-Preserving Dimension Reduction for Semantic Embedding of Multimodal Scientific Data [0.888375168590583]
One of the most common operations in multimodal scientific data management is searching for the $k$ most similar items.
The dimension of the resulting embedding vectors are usually on the order of hundreds or a thousand, which are impractically high for time-sensitive scientific applications.
This work proposes to reduce the dimensionality of the output embedding vectors such that the set of top-$k$ nearest neighbors do not change in the lower-dimensional space.
arXiv Detail & Related papers (2024-08-15T22:30:44Z) - What is the $\textit{intrinsic}$ dimension of your binary data? -- and how to compute it quickly [0.0]
In their 2006 ICDM paper Tatti et al. answered the question for a (interpretable) dimension of binary data tables by introducing a normalized correlation dimension.
In the present work we revisit their results and contrast them with a concept based notion of intrinsic dimension (ID) recently introduced for geometric data sets.
We present a novel approximation for this ID that is based on computing concepts only up to a certain support value.
arXiv Detail & Related papers (2024-04-09T14:04:26Z) - Relative intrinsic dimensionality is intrinsic to learning [49.5738281105287]
We introduce a new notion of the intrinsic dimension of a data distribution, which precisely captures the separability properties of the data.
For this intrinsic dimension, the rule of thumb above becomes a law: high intrinsic dimension guarantees highly separable data.
We show thisRelative intrinsic dimension provides both upper and lower bounds on the probability of successfully learning and generalising in a binary classification problem.
arXiv Detail & Related papers (2023-10-10T10:41:45Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - Intrinsic Dimension Estimation [92.87600241234344]
We introduce a new estimator of the intrinsic dimension and provide finite sample, non-asymptotic guarantees.
We then apply our techniques to get new sample complexity bounds for Generative Adversarial Networks (GANs) depending on the intrinsic dimension of the data.
arXiv Detail & Related papers (2021-06-08T00:05:39Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - A Topological Approach to Inferring the Intrinsic Dimension of Convex
Sensing Data [0.0]
We consider a common measurement paradigm, where an unknown subset of an affine space is measured by unknown quasi- filtration functions.
In this paper, we develop a method for inferring the dimension of the data under natural assumptions.
arXiv Detail & Related papers (2020-07-07T05:35:23Z) - Geometry of Similarity Comparisons [51.552779977889045]
We show that the ordinal capacity of a space form is related to its dimension and the sign of its curvature.
More importantly, we show that the statistical behavior of the ordinal spread random variables defined on a similarity graph can be used to identify its underlying space form.
arXiv Detail & Related papers (2020-06-17T13:37:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.