Relative intrinsic dimensionality is intrinsic to learning
- URL: http://arxiv.org/abs/2311.07579v1
- Date: Tue, 10 Oct 2023 10:41:45 GMT
- Title: Relative intrinsic dimensionality is intrinsic to learning
- Authors: Oliver J. Sutton, Qinghua Zhou, Alexander N. Gorban and Ivan Y. Tyukin
- Abstract summary: We introduce a new notion of the intrinsic dimension of a data distribution, which precisely captures the separability properties of the data.
For this intrinsic dimension, the rule of thumb above becomes a law: high intrinsic dimension guarantees highly separable data.
We show thisRelative intrinsic dimension provides both upper and lower bounds on the probability of successfully learning and generalising in a binary classification problem.
- Score: 49.5738281105287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High dimensional data can have a surprising property: pairs of data points
may be easily separated from each other, or even from arbitrary subsets, with
high probability using just simple linear classifiers. However, this is more of
a rule of thumb than a reliable property as high dimensionality alone is
neither necessary nor sufficient for successful learning. Here, we introduce a
new notion of the intrinsic dimension of a data distribution, which precisely
captures the separability properties of the data. For this intrinsic dimension,
the rule of thumb above becomes a law: high intrinsic dimension guarantees
highly separable data. We extend this notion to that of the relative intrinsic
dimension of two data distributions, which we show provides both upper and
lower bounds on the probability of successfully learning and generalising in a
binary classification problem
Related papers
- Canonical normalizing flows for manifold learning [14.377143992248222]
We propose a canonical manifold learning flow method, where a novel objective enforces the transformation matrix to have few prominent and non-degenerate basis functions.
Canonical manifold flow yields a more efficient use of the latent space, automatically generating fewer prominent and distinct dimensions to represent data.
arXiv Detail & Related papers (2023-10-19T13:48:05Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - A Short Note on the Relationship of Information Gain and Eluder
Dimension [86.86653394312134]
We show that eluder dimension and information gain are equivalent in a precise sense for reproducing kernel Hilbert spaces.
We show that this is not a coincidence -- eluder dimension and information gain are equivalent in a precise sense for reproducing kernel Hilbert spaces.
arXiv Detail & Related papers (2021-07-06T04:01:22Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Learning a Deep Part-based Representation by Preserving Data
Distribution [21.13421736154956]
Unsupervised dimensionality reduction is one of the commonly used techniques in the field of high dimensional data recognition problems.
In this paper, by preserving the data distribution, a deep part-based representation can be learned, and the novel algorithm is called Distribution Preserving Network Embedding.
The experimental results on the real-world data sets show that the proposed algorithm has good performance in terms of cluster accuracy and AMI.
arXiv Detail & Related papers (2020-09-17T12:49:36Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - ABID: Angle Based Intrinsic Dimensionality [0.0]
The intrinsic dimensionality refers to the true'' dimensionality of the data, as opposed to the dimensionality of the data representation.
Most popular methods for estimating the local intrinsic dimensionality are based on distances.
We derive the theoretical distribution of angles and use this to construct an estimator for intrinsic dimensionality.
arXiv Detail & Related papers (2020-06-23T10:19:34Z) - Robust Large-Margin Learning in Hyperbolic Space [64.42251583239347]
We present the first theoretical guarantees for learning a classifier in hyperbolic rather than Euclidean space.
We provide an algorithm to efficiently learn a large-margin hyperplane, relying on the careful injection of adversarial examples.
We prove that for hierarchical data that embeds well into hyperbolic space, the low embedding dimension ensures superior guarantees.
arXiv Detail & Related papers (2020-04-11T19:11:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.