CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability
in Visual Clustering
- URL: http://arxiv.org/abs/2308.00284v2
- Date: Fri, 11 Aug 2023 04:43:16 GMT
- Title: CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability
in Visual Clustering
- Authors: Hyeon Jeon, Ghulam Jilani Quadri, Hyunwook Lee, Paul Rosen, Danielle
Albers Szafir, and Jinwook Seo
- Abstract summary: We study perceptual variability in conducting visual clustering, which we call Cluster Ambiguity.
We introduce CLAMS, a data-driven visual quality measure for automatically predicting cluster ambiguity in monochrome scatterplots.
- Score: 23.625877882403227
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Visual clustering is a common perceptual task in scatterplots that supports
diverse analytics tasks (e.g., cluster identification). However, even with the
same scatterplot, the ways of perceiving clusters (i.e., conducting visual
clustering) can differ due to the differences among individuals and ambiguous
cluster boundaries. Although such perceptual variability casts doubt on the
reliability of data analysis based on visual clustering, we lack a systematic
way to efficiently assess this variability. In this research, we study
perceptual variability in conducting visual clustering, which we call Cluster
Ambiguity. To this end, we introduce CLAMS, a data-driven visual quality
measure for automatically predicting cluster ambiguity in monochrome
scatterplots. We first conduct a qualitative study to identify key factors that
affect the visual separation of clusters (e.g., proximity or size difference
between clusters). Based on study findings, we deploy a regression module that
estimates the human-judged separability of two clusters. Then, CLAMS predicts
cluster ambiguity by analyzing the aggregated results of all pairwise
separability between clusters that are generated by the module. CLAMS
outperforms widely-used clustering techniques in predicting ground truth
cluster ambiguity. Meanwhile, CLAMS exhibits performance on par with human
annotators. We conclude our work by presenting two applications for optimizing
and benchmarking data mining techniques using CLAMS. The interactive demo of
CLAMS is available at clusterambiguity.dev.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Self Supervised Correlation-based Permutations for Multi-View Clustering [7.972599673048582]
We propose an end-to-end deep learning-based MVC framework for general data.
Our approach involves learning meaningful fused data representations with a novel permutation-based canonical correlation objective.
We demonstrate the effectiveness of our model using ten MVC benchmark datasets.
arXiv Detail & Related papers (2024-02-26T08:08:30Z) - Enhancing cluster analysis via topological manifold learning [0.3823356975862006]
We show that inferring the topological structure of a dataset before clustering can considerably enhance cluster detection.
We combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN.
arXiv Detail & Related papers (2022-07-01T15:53:39Z) - ACTIVE:Augmentation-Free Graph Contrastive Learning for Partial
Multi-View Clustering [52.491074276133325]
We propose an augmentation-free graph contrastive learning framework to solve the problem of partial multi-view clustering.
The proposed approach elevates instance-level contrastive learning and missing data inference to the cluster-level, effectively mitigating the impact of individual missing data on clustering.
arXiv Detail & Related papers (2022-03-01T02:32:25Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - Learning Statistical Representation with Joint Deep Embedded Clustering [2.1267423178232407]
StatDEC is an unsupervised framework for joint statistical representation learning and clustering.
Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets.
arXiv Detail & Related papers (2021-09-11T09:26:52Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Clustering-friendly Representation Learning via Instance Discrimination
and Feature Decorrelation [0.0]
We propose a clustering-friendly representation learning method using instance discrimination and feature decorrelation.
In evaluations of image clustering using CIFAR-10 and ImageNet-10, our method achieves accuracy of 81.5% and 95.4%, respectively.
arXiv Detail & Related papers (2021-05-31T22:59:31Z) - Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method.
Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features.
On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z) - Multiscale Clustering of Hyperspectral Images Through Spectral-Spatial
Diffusion Geometry [9.619814126465206]
Clustering algorithms partition a dataset into groups of similar points.
The primary contribution of this article is the Multiscale Spatially-Regularized Diffusion Learning (M-SRDL) clustering algorithm.
We show that incorporating spatial regularization into a multiscale clustering framework corresponds to smoother and more coherent clusters when applied to HSI data.
arXiv Detail & Related papers (2021-03-29T17:24:28Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.