SQuadMDS: a lean Stochastic Quartet MDS improving global structure
preservation in neighbor embedding like t-SNE and UMAP
- URL: http://arxiv.org/abs/2202.12087v1
- Date: Thu, 24 Feb 2022 13:14:58 GMT
- Title: SQuadMDS: a lean Stochastic Quartet MDS improving global structure
preservation in neighbor embedding like t-SNE and UMAP
- Authors: Pierre Lambert, Cyril de Bodt, Michel Verleysen, John Lee
- Abstract summary: This work introduces a force directed approach to multidimensional scaling with a time and space complexity of O(N) with N data points.
The method can be combined with force directed layouts of the family of neighbour embedding such as t-SNE, to produce embeddings that preserve both the global and the local structures of the data.
- Score: 3.7731754155538164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multidimensional scaling is a statistical process that aims to embed high
dimensional data into a lower-dimensional space; this process is often used for
the purpose of data visualisation. Common multidimensional scaling algorithms
tend to have high computational complexities, making them inapplicable on large
data sets. This work introduces a stochastic, force directed approach to
multidimensional scaling with a time and space complexity of O(N), with N data
points. The method can be combined with force directed layouts of the family of
neighbour embedding such as t-SNE, to produce embeddings that preserve both the
global and the local structures of the data. Experiments assess the quality of
the embeddings produced by the standalone version and its hybrid extension both
quantitatively and qualitatively, showing competitive results outperforming
state-of-the-art approaches. Codes are available at
https://github.com/PierreLambert3/SQuaD-MDS-and-FItSNE-hybrid.
Related papers
- MNIST-Nd: a set of naturalistic datasets to benchmark clustering across dimensions [46.67219141114834]
We propose MNIST-Nd, a set of synthetic datasets that share a key property of real-world datasets.
MNIST-Nd is obtained by training mixture variational autoencoders with 2 to 64 latent dimensions on MNIST.
Preliminary common clustering algorithm benchmarks on MNIST-Nd suggest that Leiden is the most robust for growing dimensions.
arXiv Detail & Related papers (2024-10-21T15:51:30Z) - Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering [13.638434337947302]
FSSMSC is a novel solution to the high computational complexity commonly found in existing approaches.
The method generates a consensus anchor graph across all views, representing each data point as a sparse linear combination of chosen landmarks.
The effectiveness and efficiency of FSSMSC are validated through extensive experiments on multiple benchmark datasets of varying scales.
arXiv Detail & Related papers (2024-08-11T06:54:00Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - SIGMA: Scale-Invariant Global Sparse Shape Matching [50.385414715675076]
We propose a novel mixed-integer programming (MIP) formulation for generating precise sparse correspondences for non-rigid shapes.
We show state-of-the-art results for sparse non-rigid matching on several challenging 3D datasets.
arXiv Detail & Related papers (2023-08-16T14:25:30Z) - Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z) - Deep Recursive Embedding for High-Dimensional Data [9.611123249318126]
We propose to combine deep neural networks (DNN) with mathematics-guided embedding rules for high-dimensional data embedding.
We introduce a generic deep embedding network (DEN) framework, which is able to learn a parametric mapping from high-dimensional space to low-dimensional space.
arXiv Detail & Related papers (2021-10-31T23:22:33Z) - Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with
Coherent Embeddings [1.7188280334580195]
This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved.
The proposed algorithm has the same complexity as the original $t$-SNE to embed new items, and a lower one when considering the embedding of a dataset sliced into sub-pieces.
arXiv Detail & Related papers (2021-09-22T06:45:37Z) - Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.
The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Dense Non-Rigid Structure from Motion: A Manifold Viewpoint [162.88686222340962]
Non-Rigid Structure-from-Motion (NRSfM) problem aims to recover 3D geometry of a deforming object from its 2D feature correspondences across multiple frames.
We show that our approach significantly improves accuracy, scalability, and robustness against noise.
arXiv Detail & Related papers (2020-06-15T09:15:54Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.