DRtool: An Interactive Tool for Analyzing High-Dimensional Clusterings
- URL: http://arxiv.org/abs/2509.04603v2
- Date: Thu, 11 Sep 2025 18:37:27 GMT
- Title: DRtool: An Interactive Tool for Analyzing High-Dimensional Clusterings
- Authors: Justin Lin, Julia Fukuyama,
- Abstract summary: DRtool is an interactive tool that enables analysts to better understand and diagnose their dimension reduction results.<n>It uses various analytical plots to provide a multi-faceted perspective on results to determine legitimacy.
- Score: 1.3965477771846408
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Technological advances have spurred an increase in data complexity and dimensionality. We are now in an era in which data sets containing thousands of features are commonplace. To digest and analyze such high-dimensional data, dimension reduction techniques have been developed and advanced along with computational power. Of these techniques, nonlinear methods are most commonly employed because of their ability to construct visually interpretable embeddings. Unlike linear methods, these methods non-uniformly stretch and shrink space to create a visual impression of the high-dimensional data. Since capturing high-dimensional structures in a significantly lower number of dimensions requires drastic manipulation of space, nonlinear dimension reduction methods are known to occasionally produce false structures, especially in noisy settings. In an effort to deal with this phenomenon, we developed an interactive tool that enables analysts to better understand and diagnose their dimension reduction results. It uses various analytical plots to provide a multi-faceted perspective on results to determine legitimacy. The tool is available via an R package named DRtool.
Related papers
- An Incremental Non-Linear Manifold Approximation Method [0.0]
This research develops an incremental non-linear dimension reduction method using the Geometric Multi-Resolution Analysis (GMRA) framework for streaming data.<n>The proposed method enables real-time data analysis and visualization by incrementally updating the cluster map, basis PCA vectors, and wavelet coefficients.
arXiv Detail & Related papers (2025-04-12T03:54:05Z) - A Novel Approach for Intrinsic Dimension Estimation [0.0]
The real-life data have a complex and non-linear structure due to their nature.<n>Finding the nearly optimal representation of the dataset in a lower-dimensional space offers an applicable mechanism for improving the success of machine learning tasks.<n>We propose a highly efficient and robust intrinsic dimension estimation approach.
arXiv Detail & Related papers (2025-03-12T15:42:39Z) - Noisy Data Visualization using Functional Data Analysis [14.255424476694946]
We propose a new data visualization method called Functional Information Geometry (FIG) for dynamical processes.
We experimentally demonstrate that the resulting method outperforms a variant of EIG designed for visualization.
We then use our method to visualize EEG brain measurements of sleep activity.
arXiv Detail & Related papers (2024-06-05T15:53:25Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.<n>In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.<n>This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - DimenFix: A novel meta-dimensionality reduction method for feature
preservation [64.0476282000118]
We propose a novel meta-method, DimenFix, which can be operated upon any base dimensionality reduction method that involves a gradient-descent-like process.
By allowing users to define the importance of different features, which is considered in dimensionality reduction, DimenFix creates new possibilities to visualize and understand a given dataset.
arXiv Detail & Related papers (2022-11-30T05:35:22Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - A geometric framework for outlier detection in high-dimensional data [0.0]
Outlier or anomaly detection is an important task in data analysis.
We provide a framework that exploits the metric structure of a data set.
We show that exploiting this structure significantly improves the detection of outlying observations in high-dimensional data.
arXiv Detail & Related papers (2022-07-01T12:07:51Z) - The Intrinsic Dimension of Images and Its Impact on Learning [60.811039723427676]
It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations.
In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning.
arXiv Detail & Related papers (2021-04-18T16:29:23Z) - Modern Dimension Reduction [0.0]
This Element offers readers a suite of modern unsupervised dimension reduction techniques along with hundreds of lines of R code.
I introduce readers through application of the following techniques: locally linear embedding, t-distributed neighbor embedding, uniform manifold approximation and projection, self-organizing maps, and deep autoencoders.
arXiv Detail & Related papers (2021-03-11T14:54:33Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - NCVis: Noise Contrastive Approach for Scalable Visualization [79.44177623781043]
NCVis is a high-performance dimensionality reduction method built on a sound statistical basis of noise contrastive estimation.
We show that NCVis outperforms state-of-the-art techniques in terms of speed while preserving the representation quality of other methods.
arXiv Detail & Related papers (2020-01-30T15:43:50Z) - Multi-Objective Genetic Programming for Manifold Learning: Balancing
Quality and Dimensionality [4.4181317696554325]
State-of-the-art manifold learning algorithms are opaque in how they perform this transformation.
We introduce a multi-objective approach that automatically balances the competing objectives of manifold quality and dimensionality.
Our proposed approach is competitive with a range of baseline and state-of-the-art manifold learning methods.
arXiv Detail & Related papers (2020-01-05T23:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.