Manifold learning: what, how, and why
- URL: http://arxiv.org/abs/2311.03757v1
- Date: Tue, 7 Nov 2023 06:44:20 GMT
- Title: Manifold learning: what, how, and why
- Authors: Marina Meil\u{a} and Hanyu Zhang
- Abstract summary: Manifold learning (ML) is a set of methods to find the low dimensional structure of data.
The new representations and descriptors obtained by ML reveal the geometric shape of high dimensional point clouds.
This survey presents the principles underlying ML, the representative methods, as well as their statistical foundations from a practicing statistician's perspective.
- Score: 2.681437069928767
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Manifold learning (ML), known also as non-linear dimension reduction, is a
set of methods to find the low dimensional structure of data. Dimension
reduction for large, high dimensional data is not merely a way to reduce the
data; the new representations and descriptors obtained by ML reveal the
geometric shape of high dimensional point clouds, and allow one to visualize,
de-noise and interpret them. This survey presents the principles underlying ML,
the representative methods, as well as their statistical foundations from a
practicing statistician's perspective. It describes the trade-offs, and what
theory tells us about the parameter and algorithmic choices we make in order to
obtain reliable conclusions.
Related papers
- Dimension reduction via score ratio matching [0.9012198585960441]
We propose a framework, derived from score-matching, to extend gradient-based dimension reduction to problems where gradients are unavailable.
We show that our approach outperforms standard score-matching for problems with low-dimensional structure.
arXiv Detail & Related papers (2024-10-25T22:21:03Z) - Towards a mathematical understanding of learning from few examples with
nonlinear feature maps [68.8204255655161]
We consider the problem of data classification where the training set consists of just a few data points.
We reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities.
arXiv Detail & Related papers (2022-11-07T14:52:58Z) - Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z) - Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic
Approach to Manifold Dimension Estimation [92.81218653234669]
We present new approach to manifold hypothesis checking and underlying manifold dimension estimation.
Our geometrical method is a modification for sparse data of a well-known box-counting algorithm for Minkowski dimension calculation.
Experiments on real datasets show that the suggested approach based on two methods combination is powerful and effective.
arXiv Detail & Related papers (2021-07-08T15:35:54Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Joint Dimensionality Reduction for Separable Embedding Estimation [43.22422640265388]
Low-dimensional embeddings for data from disparate sources play critical roles in machine learning, multimedia information retrieval, and bioinformatics.
We propose a supervised dimensionality reduction method that learns linear embeddings jointly for two feature vectors representing data of different modalities or data from distinct types of entities.
Our approach compares favorably against other dimensionality reduction methods, and against a state-of-the-art method of bilinear regression for predicting gene-disease associations.
arXiv Detail & Related papers (2021-01-14T08:48:37Z) - Survey: Geometric Foundations of Data Reduction [2.238700807267101]
The purpose of this survey is to briefly introduce nonlinear dimensionality reduction (NLDR) in data reduction.
In 2001, the concept of Manifold Learning first appears as an NLDR method called Laplacian Eigenmaps.
We derive each spectral manifold learning with the matrix and operator representation, and we then discuss the convergence behavior of each method in a geometric uniform language.
arXiv Detail & Related papers (2020-08-16T07:59:22Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics.
The proposed approach is a nonparametric generalization of the sufficient dimension reduction method.
We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.