Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization
- URL: http://arxiv.org/abs/2207.12214v1
- Date: Mon, 25 Jul 2022 14:10:24 GMT
- Title: Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization
- Authors: Yan Sun, Yi Han, Jicong Fan
- Abstract summary: We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
- Score: 20.43471678277403
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Dimensionality reduction techniques aim at representing high-dimensional data
in low-dimensional spaces to extract hidden and useful information or
facilitate visual understanding and interpretation of the data. However, few of
them take into consideration the potential cluster information contained
implicitly in the high-dimensional data. In this paper, we propose LaptSNE, a
new graph-layout nonlinear dimensionality reduction method based on t-SNE, one
of the best techniques for visualizing high-dimensional data as 2D scatter
plots. Specifically, LaptSNE leverages the eigenvalue information of the graph
Laplacian to shrink the potential clusters in the low-dimensional embedding
when learning to preserve the local and global structure from high-dimensional
space to low-dimensional space. It is nontrivial to solve the proposed model
because the eigenvalues of normalized symmetric Laplacian are functions of the
decision variable. We provide a majorization-minimization algorithm with
convergence guarantee to solve the optimization problem of LaptSNE and show how
to calculate the gradient analytically, which may be of broad interest when
considering optimization with Laplacian-composited objective. We evaluate our
method by a formal comparison with state-of-the-art methods, both visually and
via established quantitative measurements. The results demonstrate the
superiority of our method over baselines such as t-SNE and UMAP. We also extend
our method to spectral clustering and establish an accurate and parameter-free
clustering algorithm, which provides us high reliability and convenience in
real applications.
Related papers
- Dimension reduction via score ratio matching [0.9012198585960441]
We propose a framework, derived from score-matching, to extend gradient-based dimension reduction to problems where gradients are unavailable.
We show that our approach outperforms standard score-matching for problems with low-dimensional structure.
arXiv Detail & Related papers (2024-10-25T22:21:03Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Linearly-scalable learning of smooth low-dimensional patterns with
permutation-aided entropic dimension reduction [0.0]
In many data science applications, the objective is to extract appropriately-ordered smooth low-dimensional data patterns from high-dimensional data sets.
We show that when selecting the Euclidean smoothness as a pattern quality criterium, both of these problems can be efficiently solved numerically.
arXiv Detail & Related papers (2023-06-17T08:03:24Z) - Topology-Preserving Dimensionality Reduction via Interleaving
Optimization [10.097180927318703]
We show how optimization seeking to minimize the interleaving distance can be incorporated into dimensionality reduction algorithms.
We demonstrate the utility of this framework to data visualization.
arXiv Detail & Related papers (2022-01-31T06:11:17Z) - ExClus: Explainable Clustering on Low-dimensional Data Representations [9.496898312608307]
Dimensionality reduction and clustering techniques are frequently used to analyze complex data sets, but their results are often not easy to interpret.
We consider how to support users in interpreting apparent cluster structure on scatter plots where the axes are not directly interpretable.
We propose a new method to compute an interpretable clustering automatically, where the explanation is in the original high-dimensional space and the clustering is coherent in the low-dimensional projection.
arXiv Detail & Related papers (2021-11-04T21:24:01Z) - Improving Metric Dimensionality Reduction with Distributed Topology [68.8204255655161]
DIPOLE is a dimensionality-reduction post-processing step that corrects an initial embedding by minimizing a loss functional with both a local, metric term and a global, topological term.
We observe that DIPOLE outperforms popular methods like UMAP, t-SNE, and Isomap on a number of popular datasets.
arXiv Detail & Related papers (2021-06-14T17:19:44Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics.
The proposed approach is a nonparametric generalization of the sufficient dimension reduction method.
We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.