Towards a comprehensive visualization of structure in data
- URL: http://arxiv.org/abs/2111.15506v2
- Date: Wed, 1 Dec 2021 07:50:38 GMT
- Title: Towards a comprehensive visualization of structure in data
- Authors: Joan Garriga and Frederic Bartumeus
- Abstract summary: We show that a simplified parameter setup with a single control parameter, namely the perplexity, can effectively balance local and global data structure visualization.
We also designed a chunk&mix protocol to efficiently parallelize t-SNE and explore data structure across a much wide range of scales.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dimensional data reduction methods are fundamental to explore and visualize
large data sets. Basic requirements for unsupervised data exploration are
simplicity, flexibility and scalability. However, current methods show complex
parameterizations and strong computational limitations when exploring large
data structures across scales. Here, we focus on the t-SNE algorithm and show
that a simplified parameter setup with a single control parameter, namely the
perplexity, can effectively balance local and global data structure
visualization. We also designed a chunk\&mix protocol to efficiently
parallelize t-SNE and explore data structure across a much wide range of scales
than currently available. Our parallel version of the BH-tSNE, namely pt-SNE,
converges to good global embedding, comparable to state-of-the-art solutions,
though the chunk\&mix protocol adds little noise and decreases the accuracy at
the local scale. Nonetheless, we show that simple post-processing can
efficiently restore local scale visualization, without any loss of precision at
the global scales. We expect the same approach to apply to faster embedding
algorithms other than BH-tSNE, like FIt-SNE or UMAP, thus, extending the
state-of-the-art and leading to more comprehensive data structure visualization
and analysis.
Related papers
- FedNE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction [47.336599393600046]
textscFedNE is a novel approach that integrates the textscFedAvg framework with the contrastive NE technique.
We conduct comprehensive experiments on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-09-17T19:23:24Z) - Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering [13.638434337947302]
FSSMSC is a novel solution to the high computational complexity commonly found in existing approaches.
The method generates a consensus anchor graph across all views, representing each data point as a sparse linear combination of chosen landmarks.
The effectiveness and efficiency of FSSMSC are validated through extensive experiments on multiple benchmark datasets of varying scales.
arXiv Detail & Related papers (2024-08-11T06:54:00Z) - Efficient Multi-View Graph Clustering with Local and Global Structure
Preservation [59.49018175496533]
We propose a novel anchor-based multi-view graph clustering framework termed Efficient Multi-View Graph Clustering with Local and Global Structure Preservation (EMVGC-LG)
Specifically, EMVGC-LG jointly optimize anchor construction and graph learning to enhance the clustering quality.
In addition, EMVGC-LG inherits the linear complexity of existing AMVGC methods respecting the sample number.
arXiv Detail & Related papers (2023-08-31T12:12:30Z) - Adaptively-weighted Integral Space for Fast Multiview Clustering [54.177846260063966]
We propose an Adaptively-weighted Integral Space for Fast Multiview Clustering (AIMC) with nearly linear complexity.
Specifically, view generation models are designed to reconstruct the view observations from the latent integral space.
Experiments conducted on several realworld datasets confirm the superiority of the proposed AIMC method.
arXiv Detail & Related papers (2022-08-25T05:47:39Z) - Design of Compressed Sensing Systems via Density-Evolution Framework for
Structure Recovery in Graphical Models [10.667885727418705]
It has been shown that learning the structure of Bayesian networks from observational data is an NP-Hard problem.
We propose a novel density-evolution based framework for optimizing compressed linear measurement systems.
We show that the structure of GBN can indeed be recovered from resulting compressed measurements.
arXiv Detail & Related papers (2022-03-17T22:16:38Z) - ExClus: Explainable Clustering on Low-dimensional Data Representations [9.496898312608307]
Dimensionality reduction and clustering techniques are frequently used to analyze complex data sets, but their results are often not easy to interpret.
We consider how to support users in interpreting apparent cluster structure on scatter plots where the axes are not directly interpretable.
We propose a new method to compute an interpretable clustering automatically, where the explanation is in the original high-dimensional space and the clustering is coherent in the low-dimensional projection.
arXiv Detail & Related papers (2021-11-04T21:24:01Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.
The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Visualizing the Finer Cluster Structure of Large-Scale and
High-Dimensional Data [7.400745342582259]
We propose using a generalized sigmoid function to model the distance similarity in both high- and low-dimensional spaces.
Using both simulated and real-world data sets, we show that our proposed method can generate visualization results comparable to those of uniform manifold approximation and projection.
arXiv Detail & Related papers (2020-07-17T01:36:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.