Related papers: Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data Visualization

Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data Visualization

URL: http://arxiv.org/abs/2207.12214v1
Date: Mon, 25 Jul 2022 14:10:24 GMT
Title: Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data Visualization
Authors: Yan Sun, Yi Han, Jicong Fan
Abstract summary: We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE. Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding. We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
Score: 20.43471678277403
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Dimensionality reduction techniques aim at representing high-dimensional data in low-dimensional spaces to extract hidden and useful information or facilitate visual understanding and interpretation of the data. However, few of them take into consideration the potential cluster information contained implicitly in the high-dimensional data. In this paper, we propose LaptSNE, a new graph-layout nonlinear dimensionality reduction method based on t-SNE, one of the best techniques for visualizing high-dimensional data as 2D scatter plots. Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding when learning to preserve the local and global structure from high-dimensional space to low-dimensional space. It is nontrivial to solve the proposed model because the eigenvalues of normalized symmetric Laplacian are functions of the decision variable. We provide a majorization-minimization algorithm with convergence guarantee to solve the optimization problem of LaptSNE and show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective. We evaluate our method by a formal comparison with state-of-the-art methods, both visually and via established quantitative measurements. The results demonstrate the superiority of our method over baselines such as t-SNE and UMAP. We also extend our method to spectral clustering and establish an accurate and parameter-free clustering algorithm, which provides us high reliability and convenience in real applications.

Related papers

A dimensionality reduction technique based on the Gromov-Wasserstein distance [7.8772082926712415]
We propose a new method for dimensionality reduction based on optimal transportation theory and the Gromov-Wasserstein distance. Our method embeds high-dimensional data into a lower-dimensional space, providing a robust and efficient solution for analyzing complex high-dimensional datasets.
arXiv Detail & Related papers (2025-01-23T15:05:51Z)
Dimension reduction via score ratio matching [0.9012198585960441]
We propose a framework, derived from score-matching, to extend gradient-based dimension reduction to problems where gradients are unavailable. We show that our approach outperforms standard score-matching for problems with low-dimensional structure.
arXiv Detail & Related papers (2024-10-25T22:21:03Z)
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z)
Linearly-scalable learning of smooth low-dimensional patterns with permutation-aided entropic dimension reduction [0.0]
In many data science applications, the objective is to extract appropriately-ordered smooth low-dimensional data patterns from high-dimensional data sets. We show that when selecting the Euclidean smoothness as a pattern quality criterium, both of these problems can be efficiently solved numerically.
arXiv Detail & Related papers (2023-06-17T08:03:24Z)
Topology-Preserving Dimensionality Reduction via Interleaving Optimization [10.097180927318703]
We show how optimization seeking to minimize the interleaving distance can be incorporated into dimensionality reduction algorithms. We demonstrate the utility of this framework to data visualization.
arXiv Detail & Related papers (2022-01-31T06:11:17Z)
ExClus: Explainable Clustering on Low-dimensional Data Representations [9.496898312608307]
Dimensionality reduction and clustering techniques are frequently used to analyze complex data sets, but their results are often not easy to interpret. We consider how to support users in interpreting apparent cluster structure on scatter plots where the axes are not directly interpretable. We propose a new method to compute an interpretable clustering automatically, where the explanation is in the original high-dimensional space and the clustering is coherent in the low-dimensional projection.
arXiv Detail & Related papers (2021-11-04T21:24:01Z)
Improving Metric Dimensionality Reduction with Distributed Topology [68.8204255655161]
DIPOLE is a dimensionality-reduction post-processing step that corrects an initial embedding by minimizing a loss functional with both a local, metric term and a global, topological term. We observe that DIPOLE outperforms popular methods like UMAP, t-SNE, and Isomap on a number of popular datasets.
arXiv Detail & Related papers (2021-06-14T17:19:44Z)
A Local Similarity-Preserving Framework for Nonlinear Dimensionality Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction. To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points. Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z)
Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching. We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics. The proposed approach is a nonparametric generalization of the sufficient dimension reduction method. We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z)
Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF. It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.