Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic
Approach to Manifold Dimension Estimation
- URL: http://arxiv.org/abs/2107.03903v1
- Date: Thu, 8 Jul 2021 15:35:54 GMT
- Title: Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic
Approach to Manifold Dimension Estimation
- Authors: Alexander Ivanov, Gleb Nosovskiy, Alexey Chekunov, Denis Fedoseev,
Vladislav Kibkalo, Mikhail Nikulin, Fedor Popelenskiy, Stepan Komkov, Ivan
Mazurenko, Aleksandr Petiushko
- Abstract summary: We present new approach to manifold hypothesis checking and underlying manifold dimension estimation.
Our geometrical method is a modification for sparse data of a well-known box-counting algorithm for Minkowski dimension calculation.
Experiments on real datasets show that the suggested approach based on two methods combination is powerful and effective.
- Score: 92.81218653234669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Manifold hypothesis states that data points in high-dimensional space
actually lie in close vicinity of a manifold of much lower dimension. In many
cases this hypothesis was empirically verified and used to enhance unsupervised
and semi-supervised learning. Here we present new approach to manifold
hypothesis checking and underlying manifold dimension estimation. In order to
do it we use two very different methods simultaneously - one geometric, another
probabilistic - and check whether they give the same result. Our geometrical
method is a modification for sparse data of a well-known box-counting algorithm
for Minkowski dimension calculation. The probabilistic method is new. Although
it exploits standard nearest neighborhood distance, it is different from
methods which were previously used in such situations. This method is robust,
fast and includes special preliminary data transformation. Experiments on real
datasets show that the suggested approach based on two methods combination is
powerful and effective.
Related papers
- Estimation of multiple mean vectors in high dimension [4.2466572124753]
We endeavour to estimate numerous multi-dimensional means of various probability distributions on a common space based on independent samples.
Our approach involves forming estimators through convex combinations of empirical means derived from these samples.
arXiv Detail & Related papers (2024-03-22T08:42:41Z) - Implicit Manifold Gaussian Process Regression [49.0787777751317]
Gaussian process regression is widely used to provide well-calibrated uncertainty estimates.
It struggles with high-dimensional data because of the implicit low-dimensional manifold upon which the data actually lies.
In this paper we propose a technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way.
arXiv Detail & Related papers (2023-10-30T09:52:48Z) - Bayesian Hyperbolic Multidimensional Scaling [2.5944208050492183]
We propose a Bayesian approach to multidimensional scaling when the low-dimensional manifold is hyperbolic.
A case-control likelihood approximation allows for efficient sampling from the posterior distribution in larger data settings.
We evaluate the proposed method against state-of-the-art alternatives using simulations, canonical reference datasets, Indian village network data, and human gene expression data.
arXiv Detail & Related papers (2022-10-26T23:34:30Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - New Methods for Detecting Concentric Objects With High Accuracy [0.0]
Fitting geometric objects to digitized data is an important problem in many areas such as iris detection, autonomous navigation, and industrial robotics operations.
There are two common approaches to fitting geometric shapes to data: the geometric (iterative) approach and algebraic (non-iterative) approach.
We develop new estimators, which can be used as reliable initial guesses for other iterative methods.
arXiv Detail & Related papers (2021-02-16T08:19:18Z) - Normal-bundle Bootstrap [2.741266294612776]
We present a method that generates new data which preserve the geometric structure of a given data set.
Inspired by algorithms for manifold learning and concepts in differential geometry, our method decomposes the underlying probability measure into a marginalized measure.
We apply our method to the inference of density ridge and related statistics, and data augmentation to reduce overfitting.
arXiv Detail & Related papers (2020-07-27T21:14:19Z) - Random extrapolation for primal-dual coordinate descent [61.55967255151027]
We introduce a randomly extrapolated primal-dual coordinate descent method that adapts to sparsity of the data matrix and the favorable structures of the objective function.
We show almost sure convergence of the sequence and optimal sublinear convergence rates for the primal-dual gap and objective values, in the general convex-concave case.
arXiv Detail & Related papers (2020-07-13T17:39:35Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - Statistical Outlier Identification in Multi-robot Visual SLAM using
Expectation Maximization [18.259478519717426]
This paper introduces a novel and distributed method for detecting inter-map loop closure outliers in simultaneous localization and mapping (SLAM)
The proposed algorithm does not rely on a good initialization and can handle more than two maps at a time.
arXiv Detail & Related papers (2020-02-07T06:34:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.