Scalable manifold learning by uniform landmark sampling and constrained
locally linear embedding
- URL: http://arxiv.org/abs/2401.01100v2
- Date: Fri, 5 Jan 2024 08:09:14 GMT
- Title: Scalable manifold learning by uniform landmark sampling and constrained
locally linear embedding
- Authors: Dehua Peng, Zhipeng Gui, Wenzhang Wei, Huayi Wu
- Abstract summary: We propose a scalable manifold learning (scML) method that can manipulate large-scale and high-dimensional data in an efficient manner.
We empirically validated the effectiveness of scML on synthetic datasets and real-world benchmarks of different types.
scML scales well with increasing data sizes and embedding dimensions, and exhibits promising performance in preserving the global structure.
- Score: 0.6144680854063939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a pivotal approach in machine learning and data science, manifold learning
aims to uncover the intrinsic low-dimensional structure within complex
nonlinear manifolds in high-dimensional space. By exploiting the manifold
hypothesis, various techniques for nonlinear dimension reduction have been
developed to facilitate visualization, classification, clustering, and gaining
key insights. Although existing manifold learning methods have achieved
remarkable successes, they still suffer from extensive distortions incurred in
the global structure, which hinders the understanding of underlying patterns.
Scalability issues also limit their applicability for handling large-scale
data. Here, we propose a scalable manifold learning (scML) method that can
manipulate large-scale and high-dimensional data in an efficient manner. It
starts by seeking a set of landmarks to construct the low-dimensional skeleton
of the entire data, and then incorporates the non-landmarks into the learned
space based on the constrained locally linear embedding (CLLE). We empirically
validated the effectiveness of scML on synthetic datasets and real-world
benchmarks of different types, and applied it to analyze the single-cell
transcriptomics and detect anomalies in electrocardiogram (ECG) signals. scML
scales well with increasing data sizes and embedding dimensions, and exhibits
promising performance in preserving the global structure. The experiments
demonstrate notable robustness in embedding quality as the sample rate
decreases.
Related papers
- Inductive Global and Local Manifold Approximation and Projection [5.629705943815797]
We first propose GLoMAP, a novel manifold learning method for dimensional reduction and high-dimensional data visualization.
We extend GLoMAP to its inductive version, iGLoMAP, which utilizes a deep neural network to map data to its lower-dimensional representation.
We have successfully applied both GLoMAP and iGLoMAP to the simulated and real-data settings, with competitive experiments against the state-of-the-art methods.
arXiv Detail & Related papers (2024-06-12T11:22:27Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Unsupervised Anomaly Detection via Nonlinear Manifold Learning [0.0]
Anomalies are samples that significantly deviate from the rest of the data and their detection plays a major role in building machine learning models.
We introduce a robust, efficient, and interpretable methodology based on nonlinear manifold learning to detect anomalies in unsupervised settings.
arXiv Detail & Related papers (2023-06-15T18:48:10Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - Adaptive Hierarchical Similarity Metric Learning with Noisy Labels [138.41576366096137]
We propose an Adaptive Hierarchical Similarity Metric Learning method.
It considers two noise-insensitive information, textiti.e., class-wise divergence and sample-wise consistency.
Our method achieves state-of-the-art performance compared with current deep metric learning approaches.
arXiv Detail & Related papers (2021-10-29T02:12:18Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - A Multiscale Environment for Learning by Diffusion [9.619814126465206]
We introduce the Multiscale Environment for Learning by Diffusion (MELD) data model.
We show that the MELD data model precisely captures latent multiscale structure in data and facilitates its analysis.
To efficiently learn the multiscale structure observed in many real datasets, we introduce the Multiscale Learning by Unsupervised Diffusion (M-LUND) clustering algorithm.
arXiv Detail & Related papers (2021-01-31T17:46:19Z) - Invertible Manifold Learning for Dimension Reduction [44.16432765844299]
Dimension reduction (DR) aims to learn low-dimensional representations of high-dimensional data with the preservation of essential information.
We propose a novel two-stage DR method, called invertible manifold learning (inv-ML) to bridge the gap between theoretical information-lossless and practical DR.
Experiments are conducted on seven datasets with a neural network implementation of inv-ML, called i-ML-Enc.
arXiv Detail & Related papers (2020-10-07T14:22:51Z) - Visualizing the Finer Cluster Structure of Large-Scale and
High-Dimensional Data [7.400745342582259]
We propose using a generalized sigmoid function to model the distance similarity in both high- and low-dimensional spaces.
Using both simulated and real-world data sets, we show that our proposed method can generate visualization results comparable to those of uniform manifold approximation and projection.
arXiv Detail & Related papers (2020-07-17T01:36:45Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.