Visual Cluster Separation Using High-Dimensional Sharpened
Dimensionality Reduction
- URL: http://arxiv.org/abs/2110.00317v1
- Date: Fri, 1 Oct 2021 11:13:51 GMT
- Title: Visual Cluster Separation Using High-Dimensional Sharpened
Dimensionality Reduction
- Authors: Youngjoo Kim, Alexandru C. Telea, Scott C. Trager, Jos B. T. M.
Roerdink
- Abstract summary: High-Dimensional Sharpened DR' (HD-SDR) is tested on both synthetic and real-world data sets.
Our method achieves good quality (measured by quality metrics) and scales computationally well with large high-dimensional data.
To illustrate its concrete applications, we further apply HD-SDR on a recent astronomical catalog.
- Score: 65.80631307271705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Applying dimensionality reduction (DR) to large, high-dimensional data sets
can be challenging when distinguishing the underlying high-dimensional data
clusters in a 2D projection for exploratory analysis. We address this problem
by first sharpening the clusters in the original high-dimensional data prior to
the DR step using Local Gradient Clustering (LGC). We then project the
sharpened data from the high-dimensional space to 2D by a user-selected DR
method. The sharpening step aids this method to preserve cluster separation in
the resulting 2D projection. With our method, end-users can label each distinct
cluster to further analyze an otherwise unlabeled data set. Our
`High-Dimensional Sharpened DR' (HD-SDR) method, tested on both synthetic and
real-world data sets, is favorable to DR methods with poor cluster separation
and yields a better visual cluster separation than these DR methods with no
sharpening. Our method achieves good quality (measured by quality metrics) and
scales computationally well with large high-dimensional data. To illustrate its
concrete applications, we further apply HD-SDR on a recent astronomical
catalog.
Related papers
- Out-of-Core Dimensionality Reduction for Large Data via Out-of-Sample Extensions [8.368145000145594]
Dimensionality reduction (DR) is a well-established approach for the visualization of high-dimensional data sets.
We propose the use of out-of-sample extensions to perform DR on large data sets.
We provide an evaluation of the projection quality of five common DR algorithms.
arXiv Detail & Related papers (2024-08-07T23:30:53Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z) - Human Motion Detection Using Sharpened Dimensionality Reduction and
Clustering [1.1172382217477126]
We propose clustering methods to easily label the 2D projections of high-dimensional data.
We test our pipeline of SDR and the clustering methods on a range of synthetic and real-world datasets.
We conclude that clustering SDR results yields better labeling results than clustering plain DR, and that k-means is the recommended clustering method for SDR.
arXiv Detail & Related papers (2022-02-23T18:18:25Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural
Networks [81.64530401885476]
We propose a self-supervised LiDAR odometry method, dubbed SelfVoxeLO, to tackle these two difficulties.
Specifically, we propose a 3D convolution network to process the raw LiDAR data directly, which extracts features that better encode the 3D geometric patterns.
We evaluate our method's performances on two large-scale datasets, i.e., KITTI and Apollo-SouthBay.
arXiv Detail & Related papers (2020-10-19T09:23:39Z) - Model-based Clustering using Automatic Differentiation: Confronting
Misspecification and High-Dimensional Data [6.053629733936546]
We study two practically important cases of model based clustering using Gaussian Mixture Models.
We show that EM has better clustering performance, measured by Adjusted Rand Index, compared to Gradient Descent in cases of misspecification.
We propose a new penalty term for the likelihood based on the Kullback Leibler divergence between pairs of fitted components.
arXiv Detail & Related papers (2020-07-08T10:56:05Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.