Visualizing the Finer Cluster Structure of Large-Scale and
High-Dimensional Data
- URL: http://arxiv.org/abs/2007.08711v1
- Date: Fri, 17 Jul 2020 01:36:45 GMT
- Title: Visualizing the Finer Cluster Structure of Large-Scale and
High-Dimensional Data
- Authors: Yu Liang, Arin Chaudhuri, and Haoyu Wang
- Abstract summary: We propose using a generalized sigmoid function to model the distance similarity in both high- and low-dimensional spaces.
Using both simulated and real-world data sets, we show that our proposed method can generate visualization results comparable to those of uniform manifold approximation and projection.
- Score: 7.400745342582259
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dimension reduction and visualization of high-dimensional data have become
very important research topics because of the rapid growth of large databases
in data science. In this paper, we propose using a generalized sigmoid function
to model the distance similarity in both high- and low-dimensional spaces. In
particular, the parameter b is introduced to the generalized sigmoid function
in low-dimensional space, so that we can adjust the heaviness of the function
tail by changing the value of b. Using both simulated and real-world data sets,
we show that our proposed method can generate visualization results comparable
to those of uniform manifold approximation and projection (UMAP), which is a
newly developed manifold learning technique with fast running speed, better
global structure, and scalability to massive data sets. In addition, according
to the purpose of the study and the data structure, we can decrease or increase
the value of b to either reveal the finer cluster structure of the data or
maintain the neighborhood continuity of the embedding for better visualization.
Finally, we use domain knowledge to demonstrate that the finer subclusters
revealed with small values of b are meaningful.
Related papers
- Topology-aware Reinforcement Feature Space Reconstruction for Graph Data [22.5530178427691]
Reconstructing a good feature space is essential to augment the AI power of data, improve model generalization, and increase the availability of downstream ML models.
We use topology-aware reinforcement learning to automate and optimize feature space reconstruction for graph data.
Our approach combines the extraction of core subgraphs to capture essential structural information with a graph neural network (GNN) to encode topological features and reduce computing complexity.
arXiv Detail & Related papers (2024-11-08T18:01:05Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Scalable manifold learning by uniform landmark sampling and constrained
locally linear embedding [0.6144680854063939]
We propose a scalable manifold learning (scML) method that can manipulate large-scale and high-dimensional data in an efficient manner.
We empirically validated the effectiveness of scML on synthetic datasets and real-world benchmarks of different types.
scML scales well with increasing data sizes and embedding dimensions, and exhibits promising performance in preserving the global structure.
arXiv Detail & Related papers (2024-01-02T08:43:06Z) - T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified
Visual Modalities [69.16656086708291]
Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces.
We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning.
The model can be scaled to generate high-resolution data while unifying multiple modalities.
arXiv Detail & Related papers (2023-05-24T03:32:03Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z) - A geometric framework for outlier detection in high-dimensional data [0.0]
Outlier or anomaly detection is an important task in data analysis.
We provide a framework that exploits the metric structure of a data set.
We show that exploiting this structure significantly improves the detection of outlying observations in high-dimensional data.
arXiv Detail & Related papers (2022-07-01T12:07:51Z) - Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.
The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z) - Mix Dimension in Poincar\'{e} Geometry for 3D Skeleton-based Action
Recognition [57.98278794950759]
Graph Convolutional Networks (GCNs) have already demonstrated their powerful ability to model the irregular data.
We present a novel spatial-temporal GCN architecture which is defined via the Poincar'e geometry.
We evaluate our method on two current largest scale 3D datasets.
arXiv Detail & Related papers (2020-07-30T18:23:18Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.