Related papers: Manifold learning: what, how, and why

Manifold learning: what, how, and why

URL: http://arxiv.org/abs/2311.03757v1
Date: Tue, 7 Nov 2023 06:44:20 GMT
Title: Manifold learning: what, how, and why
Authors: Marina Meil\u{a} and Hanyu Zhang
Abstract summary: Manifold learning (ML) is a set of methods to find the low dimensional structure of data. The new representations and descriptors obtained by ML reveal the geometric shape of high dimensional point clouds. This survey presents the principles underlying ML, the representative methods, as well as their statistical foundations from a practicing statistician's perspective.
Score: 2.681437069928767
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Manifold learning (ML), known also as non-linear dimension reduction, is a set of methods to find the low dimensional structure of data. Dimension reduction for large, high dimensional data is not merely a way to reduce the data; the new representations and descriptors obtained by ML reveal the geometric shape of high dimensional point clouds, and allow one to visualize, de-noise and interpret them. This survey presents the principles underlying ML, the representative methods, as well as their statistical foundations from a practicing statistician's perspective. It describes the trade-offs, and what theory tells us about the parameter and algorithmic choices we make in order to obtain reliable conclusions.

Related papers

Merging Hazy Sets with m-Schemes: A Geometric Approach to Data Visualization [0.09320657506524149]
We introduce a framework for aggregating dissimilarity functions that arise from locally adjusting a metric through density-aware normalization. We formalize these approaches as m-schemes, a class of methods closely related to t-norms and t-conorms in probabilistic metrics.
arXiv Detail & Related papers (2025-03-03T15:40:08Z)
A dimensionality reduction technique based on the Gromov-Wasserstein distance [7.8772082926712415]
We propose a new method for dimensionality reduction based on optimal transportation theory and the Gromov-Wasserstein distance. Our method embeds high-dimensional data into a lower-dimensional space, providing a robust and efficient solution for analyzing complex high-dimensional datasets.
arXiv Detail & Related papers (2025-01-23T15:05:51Z)
Dimension reduction via score ratio matching [0.9012198585960441]
We propose a framework, derived from score-matching, to extend gradient-based dimension reduction to problems where gradients are unavailable. We show that our approach outperforms standard score-matching for problems with low-dimensional structure.
arXiv Detail & Related papers (2024-10-25T22:21:03Z)
Towards a mathematical understanding of learning from few examples with nonlinear feature maps [68.8204255655161]
We consider the problem of data classification where the training set consists of just a few data points. We reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities.
arXiv Detail & Related papers (2022-11-07T14:52:58Z)
Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE. Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding. We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z)
Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic Approach to Manifold Dimension Estimation [92.81218653234669]
We present new approach to manifold hypothesis checking and underlying manifold dimension estimation. Our geometrical method is a modification for sparse data of a well-known box-counting algorithm for Minkowski dimension calculation. Experiments on real datasets show that the suggested approach based on two methods combination is powerful and effective.
arXiv Detail & Related papers (2021-07-08T15:35:54Z)
A Local Similarity-Preserving Framework for Nonlinear Dimensionality Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction. To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points. Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z)
Joint Dimensionality Reduction for Separable Embedding Estimation [43.22422640265388]
Low-dimensional embeddings for data from disparate sources play critical roles in machine learning, multimedia information retrieval, and bioinformatics. We propose a supervised dimensionality reduction method that learns linear embeddings jointly for two feature vectors representing data of different modalities or data from distinct types of entities. Our approach compares favorably against other dimensionality reduction methods, and against a state-of-the-art method of bilinear regression for predicting gene-disease associations.
arXiv Detail & Related papers (2021-01-14T08:48:37Z)
Survey: Geometric Foundations of Data Reduction [2.238700807267101]
The purpose of this survey is to briefly introduce nonlinear dimensionality reduction (NLDR) in data reduction. In 2001, the concept of Manifold Learning first appears as an NLDR method called Laplacian Eigenmaps. We derive each spectral manifold learning with the matrix and operator representation, and we then discuss the convergence behavior of each method in a geometric uniform language.
arXiv Detail & Related papers (2020-08-16T07:59:22Z)
Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data. Many popular methods can fail dramatically, even on simple two-dimensional Manifolds. This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates. Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z)
Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics. The proposed approach is a nonparametric generalization of the sufficient dimension reduction method. We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z)
Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF. It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.