Supervised Dimensionality Reduction and Visualization using
Centroid-encoder
- URL: http://arxiv.org/abs/2002.11934v2
- Date: Fri, 28 Feb 2020 23:22:24 GMT
- Title: Supervised Dimensionality Reduction and Visualization using
Centroid-encoder
- Authors: Tomojit Ghosh and Michael Kirby
- Abstract summary: Centroid-Encoder (CE) method is similar to autoencoder but incorporates label information to keep objects of a class close together in reduced visualization space.
CE exploits nonlinearity and labels to encode high variance in low dimensions while capturing the global structure of the data.
We show that when the data variance is spread across multiple modalities, centroid-encoder extracts a significant amount of information from the data in low dimensional space.
- Score: 1.2487990897680423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visualizing high-dimensional data is an essential task in Data Science and
Machine Learning. The Centroid-Encoder (CE) method is similar to the
autoencoder but incorporates label information to keep objects of a class close
together in the reduced visualization space. CE exploits nonlinearity and
labels to encode high variance in low dimensions while capturing the global
structure of the data. We present a detailed analysis of the method using a
wide variety of data sets and compare it with other supervised dimension
reduction techniques, including NCA, nonlinear NCA, t-distributed NCA,
t-distributed MCML, supervised UMAP, supervised PCA, Colored Maximum Variance
Unfolding, supervised Isomap, Parametric Embedding, supervised Neighbor
Retrieval Visualizer, and Multiple Relational Embedding. We empirically show
that centroid-encoder outperforms most of these techniques. We also show that
when the data variance is spread across multiple modalities, centroid-encoder
extracts a significant amount of information from the data in low dimensional
space. This key feature establishes its value to use it as a tool for data
visualization.
Related papers
- Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation.
The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation.
In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space.
A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z) - Asymmetric double-winged multi-view clustering network for exploring
Diverse and Consistent Information [28.300395619444796]
In unsupervised scenarios, deep contrastive multi-view clustering (DCMVC) is becoming a hot research spot.
We propose a novel multi-view clustering network termed CodingNet to explore the diverse and consistent information simultaneously.
Our framework's efficacy is validated through extensive experiments on six widely used benchmark datasets.
arXiv Detail & Related papers (2023-09-01T14:13:22Z) - ShaRP: Shape-Regularized Multidimensional Projections [71.30697308446064]
We present a novel projection technique - ShaRP - that provides users explicit control over the visual signature of the created scatterplot.
ShaRP scales well with dimensionality and dataset size, and generically handles any quantitative dataset.
arXiv Detail & Related papers (2023-06-01T11:16:58Z) - Semi-Supervised Manifold Learning with Complexity Decoupled Chart Autoencoders [45.29194877564103]
This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels.
We discuss the approximation power of such networks and derive a bound that essentially depends on the intrinsic dimension of the data manifold rather than the dimension of ambient space.
arXiv Detail & Related papers (2022-08-22T19:58:03Z) - Unsupervised Machine Learning for Exploratory Data Analysis of Exoplanet
Transmission Spectra [68.8204255655161]
We focus on unsupervised techniques for analyzing spectral data from transiting exoplanets.
We show that there is a high degree of correlation in the spectral data, which calls for appropriate low-dimensional representations.
We uncover interesting structures in the principal component basis, namely, well-defined branches corresponding to different chemical regimes.
arXiv Detail & Related papers (2022-01-07T22:26:33Z) - Scalable semi-supervised dimensionality reduction with GPU-accelerated
EmbedSOM [0.0]
BlosSOM is a high-performance semi-supervised dimensionality reduction software for interactive user-steerable visualization of high-dimensional datasets.
We show the application of BlosSOM on realistic datasets, where it helps to produce high-quality visualizations that incorporate user-specified layout and focus on certain features.
arXiv Detail & Related papers (2022-01-03T15:06:22Z) - Visual Cluster Separation Using High-Dimensional Sharpened
Dimensionality Reduction [65.80631307271705]
High-Dimensional Sharpened DR' (HD-SDR) is tested on both synthetic and real-world data sets.
Our method achieves good quality (measured by quality metrics) and scales computationally well with large high-dimensional data.
To illustrate its concrete applications, we further apply HD-SDR on a recent astronomical catalog.
arXiv Detail & Related papers (2021-10-01T11:13:51Z) - Supervised Visualization for Data Exploration [9.742277703732187]
We describe a novel supervised visualization technique based on random forest proximities and diffusion-based dimensionality reduction.
Our approach is robust to noise and parameter tuning, thus making it simple to use while producing reliable visualizations for data exploration.
arXiv Detail & Related papers (2020-06-15T19:10:17Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.