Modern Dimension Reduction
- URL: http://arxiv.org/abs/2103.06885v1
- Date: Thu, 11 Mar 2021 14:54:33 GMT
- Title: Modern Dimension Reduction
- Authors: Philip D. Waggoner
- Abstract summary: This Element offers readers a suite of modern unsupervised dimension reduction techniques along with hundreds of lines of R code.
I introduce readers through application of the following techniques: locally linear embedding, t-distributed neighbor embedding, uniform manifold approximation and projection, self-organizing maps, and deep autoencoders.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data are not only ubiquitous in society, but are increasingly complex both in
size and dimensionality. Dimension reduction offers researchers and scholars
the ability to make such complex, high dimensional data spaces simpler and more
manageable. This Element offers readers a suite of modern unsupervised
dimension reduction techniques along with hundreds of lines of R code, to
efficiently represent the original high dimensional data space in a simplified,
lower dimensional subspace. Launching from the earliest dimension reduction
technique principal components analysis and using real social science data, I
introduce and walk readers through application of the following techniques:
locally linear embedding, t-distributed stochastic neighbor embedding (t-SNE),
uniform manifold approximation and projection, self-organizing maps, and deep
autoencoders. The result is a well-stocked toolbox of unsupervised algorithms
for tackling the complexities of high dimensional data so common in modern
society. All code is publicly accessible on Github.
Related papers
- Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Relative intrinsic dimensionality is intrinsic to learning [49.5738281105287]
We introduce a new notion of the intrinsic dimension of a data distribution, which precisely captures the separability properties of the data.
For this intrinsic dimension, the rule of thumb above becomes a law: high intrinsic dimension guarantees highly separable data.
We show thisRelative intrinsic dimension provides both upper and lower bounds on the probability of successfully learning and generalising in a binary classification problem.
arXiv Detail & Related papers (2023-10-10T10:41:45Z) - In search of the most efficient and memory-saving visualization of high
dimensional data [0.0]
We argue that the visualization of multidimensional data is well approximated of two-directed embedding of undimensional nearest-neighbor graphs.
Existing reduction methods are too slow and do not allow interactive manipulation.
We show that high-quality embeddings are produced with minimal time and memory complexity.
arXiv Detail & Related papers (2023-02-27T20:56:13Z) - A Data-dependent Approach for High Dimensional (Robust) Wasserstein
Alignment [10.374243304018794]
We propose an effective framework to compress the high dimensional geometric patterns.
Our idea is inspired by the observation that high dimensional data often has a low intrinsic dimension.
Our framework is a data-dependent'' approach that has the complexity depending on the intrinsic dimension of the input data.
arXiv Detail & Related papers (2022-09-07T03:29:26Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - High-dimensional separability for one- and few-shot learning [58.8599521537]
This work is driven by a practical question, corrections of Artificial Intelligence (AI) errors.
Special external devices, correctors, are developed. They should provide quick and non-iterative system fix without modification of a legacy AI system.
New multi-correctors of AI systems are presented and illustrated with examples of predicting errors and learning new classes of objects by a deep convolutional neural network.
arXiv Detail & Related papers (2021-06-28T14:58:14Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Learning a Deep Part-based Representation by Preserving Data
Distribution [21.13421736154956]
Unsupervised dimensionality reduction is one of the commonly used techniques in the field of high dimensional data recognition problems.
In this paper, by preserving the data distribution, a deep part-based representation can be learned, and the novel algorithm is called Distribution Preserving Network Embedding.
The experimental results on the real-world data sets show that the proposed algorithm has good performance in terms of cluster accuracy and AMI.
arXiv Detail & Related papers (2020-09-17T12:49:36Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - Online high rank matrix completion [39.570686604641836]
Recent advances in matrix completion enable data imputation in full-rank matrices by exploiting low dimensional (nonlinear) latent structure.
We develop a new model for high rank matrix completion, together with batch and online methods to fit the model and out-of-sample extension.
arXiv Detail & Related papers (2020-02-20T18:31:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.