Contrastive analysis for scatter plot-based representations of
dimensionality reduction
- URL: http://arxiv.org/abs/2101.12044v1
- Date: Tue, 26 Jan 2021 01:16:31 GMT
- Title: Contrastive analysis for scatter plot-based representations of
dimensionality reduction
- Authors: Wilson E. Marc\'ilio-Jr, Danilo M. Eler, Rog\'erio E. Garcia
- Abstract summary: This paper introduces a methodology to explore multidimensional datasets and interpret clusters' formation.
We also introduce a bipartite graph to visually interpret and explore the relationship between the statistical variables used to understand how the attributes influenced cluster formation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploring multidimensional datasets is a ubiquitous part of the ones working
with data, where interpreting clusters is one of the main tasks. These
multidimensional datasets are usually encoded using scatter-plots
representations, where spatial proximity encodes similarity among data samples.
In the literature, techniques try to understand the scatter plot organization
by visualizing the importance of the features for clusters definition with
interaction and layout enrichment strategies. However, the approaches used to
interpret dimensionality reduction usually do not differentiate clusters well,
which hampers analysis where the focus is to understand the differences among
clusters. This paper introduces a methodology to visually explore
multidimensional datasets and interpret clusters' formation based on the
contrastive analysis. We also introduce a bipartite graph to visually interpret
and explore the relationship between the statistical variables used to
understand how the attributes influenced cluster formation. Our methodology is
validated through case studies. We explore a multivariate dataset of patients
with vertebral problems and two document collections, one related to news
articles and other related to tweets about COVID-19 symptoms. Finally, we also
validate our approach through quantitative results to demonstrate how it can be
robust enough to support multidimensional analysis.
Related papers
- Comparing the information content of probabilistic representation spaces [3.7277730514654555]
Probabilistic representation spaces convey information about a dataset, and to understand the effects of factors such as training loss and network architecture, we seek to compare the information content of such spaces.
Here, instead of building upon point-based measures of comparison, we build upon classic methods from literature on hard clustering.
We propose a practical method of estimation that is based on fingerprinting a representation space with a sample of the dataset and is applicable when the communicated information is only a handful of bits.
arXiv Detail & Related papers (2024-05-31T17:33:07Z) - Self Supervised Correlation-based Permutations for Multi-View Clustering [7.972599673048582]
We propose an end-to-end deep learning-based MVC framework for general data.
Our approach involves learning meaningful fused data representations with a novel permutation-based canonical correlation objective.
We demonstrate the effectiveness of our model using ten MVC benchmark datasets.
arXiv Detail & Related papers (2024-02-26T08:08:30Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation.
The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation.
In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space.
A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z) - One for all: A novel Dual-space Co-training baseline for Large-scale
Multi-View Clustering [42.92751228313385]
We propose a novel multi-view clustering model, named Dual-space Co-training Large-scale Multi-view Clustering (DSCMC)
The main objective of our approach is to enhance the clustering performance by leveraging co-training in two distinct spaces.
Our algorithm has an approximate linear computational complexity, which guarantees its successful application on large-scale datasets.
arXiv Detail & Related papers (2024-01-28T16:30:13Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Metric Distribution to Vector: Constructing Data Representation via
Broad-Scale Discrepancies [15.40538348604094]
We present a novel embedding strategy named $mathbfMetricDistribution2vec$ to extract distribution characteristics into the vectorial representation for each data.
We demonstrate the application and effectiveness of our representation method in the supervised prediction tasks on extensive real-world structural graph datasets.
arXiv Detail & Related papers (2022-10-02T03:18:30Z) - ACTIVE:Augmentation-Free Graph Contrastive Learning for Partial
Multi-View Clustering [52.491074276133325]
We propose an augmentation-free graph contrastive learning framework to solve the problem of partial multi-view clustering.
The proposed approach elevates instance-level contrastive learning and missing data inference to the cluster-level, effectively mitigating the impact of individual missing data on clustering.
arXiv Detail & Related papers (2022-03-01T02:32:25Z) - Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering.
Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm.
Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z) - Explaining dimensionality reduction results using Shapley values [0.0]
Dimensionality reduction (DR) techniques have been consistently supporting high-dimensional data analysis in various applications.
Current literature approaches designed to interpret DR techniques do not explain the features' contributions well since they focus only on the low-dimensional representation or do not consider the relationship among features.
This paper presents ClusterShapley to address these problems, using Shapley values to generate explanations of dimensionality reduction techniques and interpret these algorithms using a cluster-oriented analysis.
arXiv Detail & Related papers (2021-03-09T19:28:10Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.