What is the $\textit{intrinsic}$ dimension of your binary data? -- and how to compute it quickly
- URL: http://arxiv.org/abs/2404.06326v1
- Date: Tue, 9 Apr 2024 14:04:26 GMT
- Title: What is the $\textit{intrinsic}$ dimension of your binary data? -- and how to compute it quickly
- Authors: Tom Hanika, Tobias Hille,
- Abstract summary: In their 2006 ICDM paper Tatti et al. answered the question for a (interpretable) dimension of binary data tables by introducing a normalized correlation dimension.
In the present work we revisit their results and contrast them with a concept based notion of intrinsic dimension (ID) recently introduced for geometric data sets.
We present a novel approximation for this ID that is based on computing concepts only up to a certain support value.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dimensionality is an important aspect for analyzing and understanding (high-dimensional) data. In their 2006 ICDM paper Tatti et al. answered the question for a (interpretable) dimension of binary data tables by introducing a normalized correlation dimension. In the present work we revisit their results and contrast them with a concept based notion of intrinsic dimension (ID) recently introduced for geometric data sets. To do this, we present a novel approximation for this ID that is based on computing concepts only up to a certain support value. We demonstrate and evaluate our approximation using all available datasets from Tatti et al., which have between 469 and 41271 extrinsic dimensions.
Related papers
- Relative intrinsic dimensionality is intrinsic to learning [49.5738281105287]
We introduce a new notion of the intrinsic dimension of a data distribution, which precisely captures the separability properties of the data.
For this intrinsic dimension, the rule of thumb above becomes a law: high intrinsic dimension guarantees highly separable data.
We show thisRelative intrinsic dimension provides both upper and lower bounds on the probability of successfully learning and generalising in a binary classification problem.
arXiv Detail & Related papers (2023-10-10T10:41:45Z) - Interpretable Linear Dimensionality Reduction based on Bias-Variance
Analysis [45.3190496371625]
We propose a principled dimensionality reduction approach that maintains the interpretability of the resulting features.
In this way, all features are considered, the dimensionality is reduced and the interpretability is preserved.
arXiv Detail & Related papers (2023-03-26T14:30:38Z) - DimenFix: A novel meta-dimensionality reduction method for feature
preservation [64.0476282000118]
We propose a novel meta-method, DimenFix, which can be operated upon any base dimensionality reduction method that involves a gradient-descent-like process.
By allowing users to define the importance of different features, which is considered in dimensionality reduction, DimenFix creates new possibilities to visualize and understand a given dataset.
arXiv Detail & Related papers (2022-11-30T05:35:22Z) - Intrinsic Dimension for Large-Scale Geometric Learning [0.0]
A naive approach to determine the dimension of a dataset is based on the number of attributes.
More sophisticated methods derive a notion of intrinsic dimension (ID) that employs more complex feature functions.
arXiv Detail & Related papers (2022-10-11T09:50:50Z) - FONDUE: an algorithm to find the optimal dimensionality of the latent
representations of variational autoencoders [2.969705152497174]
In this paper, we explore the intrinsic dimension estimation (IDE) of the data and latent representations learned by VAEs.
We show that the discrepancies between theIDE of the mean and sampled representations of a VAE after only a few steps of training reveal the presence of passive variables in the latent space.
We propose FONDUE: an algorithm which quickly finds the number of latent dimensions after which the mean and sampled representations start to diverge.
arXiv Detail & Related papers (2022-09-26T15:59:54Z) - The Mean Dimension of Neural Networks -- What causes the interaction
effects? [0.9208007322096533]
Owen and Hoyt recently showed that the effective dimension offers key structural information about the input-output mapping underlying an artificial neural network.
This work proposes an estimation procedure that allows the calculation of the mean dimension from a given dataset.
arXiv Detail & Related papers (2022-07-11T14:00:06Z) - Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic
Approach to Manifold Dimension Estimation [92.81218653234669]
We present new approach to manifold hypothesis checking and underlying manifold dimension estimation.
Our geometrical method is a modification for sparse data of a well-known box-counting algorithm for Minkowski dimension calculation.
Experiments on real datasets show that the suggested approach based on two methods combination is powerful and effective.
arXiv Detail & Related papers (2021-07-08T15:35:54Z) - Intrinsic Dimension Estimation [92.87600241234344]
We introduce a new estimator of the intrinsic dimension and provide finite sample, non-asymptotic guarantees.
We then apply our techniques to get new sample complexity bounds for Generative Adversarial Networks (GANs) depending on the intrinsic dimension of the data.
arXiv Detail & Related papers (2021-06-08T00:05:39Z) - The Intrinsic Dimension of Images and Its Impact on Learning [60.811039723427676]
It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations.
In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning.
arXiv Detail & Related papers (2021-04-18T16:29:23Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - ABID: Angle Based Intrinsic Dimensionality [0.0]
The intrinsic dimensionality refers to the true'' dimensionality of the data, as opposed to the dimensionality of the data representation.
Most popular methods for estimating the local intrinsic dimensionality are based on distances.
We derive the theoretical distribution of angles and use this to construct an estimator for intrinsic dimensionality.
arXiv Detail & Related papers (2020-06-23T10:19:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.