Related papers: Supervised Dimensionality Reduction and Visualization using Centroid-encoder

Supervised Dimensionality Reduction and Visualization using Centroid-encoder

URL: http://arxiv.org/abs/2002.11934v2
Date: Fri, 28 Feb 2020 23:22:24 GMT
Title: Supervised Dimensionality Reduction and Visualization using Centroid-encoder
Authors: Tomojit Ghosh and Michael Kirby
Abstract summary: Centroid-Encoder (CE) method is similar to autoencoder but incorporates label information to keep objects of a class close together in reduced visualization space. CE exploits nonlinearity and labels to encode high variance in low dimensions while capturing the global structure of the data. We show that when the data variance is spread across multiple modalities, centroid-encoder extracts a significant amount of information from the data in low dimensional space.
Score: 1.2487990897680423
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visualizing high-dimensional data is an essential task in Data Science and Machine Learning. The Centroid-Encoder (CE) method is similar to the autoencoder but incorporates label information to keep objects of a class close together in the reduced visualization space. CE exploits nonlinearity and labels to encode high variance in low dimensions while capturing the global structure of the data. We present a detailed analysis of the method using a wide variety of data sets and compare it with other supervised dimension reduction techniques, including NCA, nonlinear NCA, t-distributed NCA, t-distributed MCML, supervised UMAP, supervised PCA, Colored Maximum Variance Unfolding, supervised Isomap, Parametric Embedding, supervised Neighbor Retrieval Visualizer, and Multiple Relational Embedding. We empirically show that centroid-encoder outperforms most of these techniques. We also show that when the data variance is spread across multiple modalities, centroid-encoder extracts a significant amount of information from the data in low dimensional space. This key feature establishes its value to use it as a tool for data visualization.

Related papers

Autoencoded UMAP-Enhanced Clustering for Unsupervised Learning [49.1574468325115]
We propose a novel approach to unsupervised learning by constructing a non-linear embedding of the data into a low-dimensional space followed by any conventional clustering algorithm. The embedding promotes clusterability of the data and is comprised of two mappings: the encoder of an autoencoder neural network and the output of UMAP algorithm. When applied to MNIST data, AUEC significantly outperforms the state-of-the-art techniques in terms of clustering accuracy.
arXiv Detail & Related papers (2025-01-13T22:30:38Z)
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z)
Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation. The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation. In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space. A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z)
Asymmetric double-winged multi-view clustering network for exploring Diverse and Consistent Information [28.300395619444796]
In unsupervised scenarios, deep contrastive multi-view clustering (DCMVC) is becoming a hot research spot. We propose a novel multi-view clustering network termed CodingNet to explore the diverse and consistent information simultaneously. Our framework's efficacy is validated through extensive experiments on six widely used benchmark datasets.
arXiv Detail & Related papers (2023-09-01T14:13:22Z)
ShaRP: Shape-Regularized Multidimensional Projections [71.30697308446064]
We present a novel projection technique - ShaRP - that provides users explicit control over the visual signature of the created scatterplot. ShaRP scales well with dimensionality and dataset size, and generically handles any quantitative dataset.
arXiv Detail & Related papers (2023-06-01T11:16:58Z)
Semi-Supervised Manifold Learning with Complexity Decoupled Chart Autoencoders [45.29194877564103]
This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels. We discuss the approximation power of such networks and derive a bound that essentially depends on the intrinsic dimension of the data manifold rather than the dimension of ambient space.
arXiv Detail & Related papers (2022-08-22T19:58:03Z)
Unsupervised Machine Learning for Exploratory Data Analysis of Exoplanet Transmission Spectra [68.8204255655161]
We focus on unsupervised techniques for analyzing spectral data from transiting exoplanets. We show that there is a high degree of correlation in the spectral data, which calls for appropriate low-dimensional representations. We uncover interesting structures in the principal component basis, namely, well-defined branches corresponding to different chemical regimes.
arXiv Detail & Related papers (2022-01-07T22:26:33Z)
Scalable semi-supervised dimensionality reduction with GPU-accelerated EmbedSOM [0.0]
BlosSOM is a high-performance semi-supervised dimensionality reduction software for interactive user-steerable visualization of high-dimensional datasets. We show the application of BlosSOM on realistic datasets, where it helps to produce high-quality visualizations that incorporate user-specified layout and focus on certain features.
arXiv Detail & Related papers (2022-01-03T15:06:22Z)
Visual Cluster Separation Using High-Dimensional Sharpened Dimensionality Reduction [65.80631307271705]
High-Dimensional Sharpened DR' (HD-SDR) is tested on both synthetic and real-world data sets. Our method achieves good quality (measured by quality metrics) and scales computationally well with large high-dimensional data. To illustrate its concrete applications, we further apply HD-SDR on a recent astronomical catalog.
arXiv Detail & Related papers (2021-10-01T11:13:51Z)
Supervised Visualization for Data Exploration [9.742277703732187]
We describe a novel supervised visualization technique based on random forest proximities and diffusion-based dimensionality reduction. Our approach is robust to noise and parameter tuning, thus making it simple to use while producing reliable visualizations for data exploration.
arXiv Detail & Related papers (2020-06-15T19:10:17Z)
Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF. It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.