Related papers: HPSCAN: Human Perception-Based Scattered Data Clustering

HPSCAN: Human Perception-Based Scattered Data Clustering

URL: http://arxiv.org/abs/2304.14185v4
Date: Thu, 30 Jan 2025 07:12:45 GMT
Title: HPSCAN: Human Perception-Based Scattered Data Clustering
Authors: Sebastian Hartwig, Christian van Onzenoodt, Dominik Engel, Pedro Hermosilla, Timo Ropinski,
Abstract summary: We propose HPSCAN, a learning strategy that operates directly on scattered data.<n>We report on how we collected our dataset, report statistics of the resulting annotations, and investigate the perceptual agreement of cluster separation for real-world data.
Score: 15.217526822318428
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cluster separation is a task typically tackled by widely used clustering techniques, such as k-means or DBSCAN. However, these algorithms are based on non-perceptual metrics, and our experiments demonstrate that their output does not reflect human cluster perception. To bridge the gap between human cluster perception and machine-computed clusters, we propose HPSCAN, a learning strategy that operates directly on scattered data. To learn perceptual cluster separation on such data, we crowdsourced the labeling of 7,320 bivariate (scatterplot) datasets to 384 human participants. We train our HPSCAN model on these human-annotated data. Instead of rendering these data as scatterplot images, we used their x and y point coordinates as input to a modified PointNet++ architecture, enabling direct inference on point clouds. In this work, we provide details on how we collected our dataset, report statistics of the resulting annotations, and investigate the perceptual agreement of cluster separation for real-world data. We also report the training and evaluation protocol for HPSCAN and introduce a novel metric, that measures the accuracy between a clustering technique and a group of human annotators. We explore predicting point-wise human agreement to detect ambiguities. Finally, we compare our approach to ten established clustering techniques and demonstrate that HPSCAN is capable of generalizing to unseen and out-of-scope data.

Related papers

THESAURUS: Contrastive Graph Clustering by Swapping Fused Gromov-Wasserstein Couplings [9.805171821491207]
We present conTrastive grapH clustEring by SwApping fUsed gRomov-wasserstein coUplingS (THESAURUS) Our method introduces semantic prototypes to provide contextual information, and employs a cross-view assignment prediction pretext task. It utilizes Gromov-Wasserstein Optimal Transport (GW-OT) along with the proposed prototype graph to thoroughly exploit cluster information in the graph structure.
arXiv Detail & Related papers (2024-12-16T08:33:56Z)
Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance. DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator. Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering [23.625877882403227]
We study perceptual variability in conducting visual clustering, which we call Cluster Ambiguity. We introduce CLAMS, a data-driven visual quality measure for automatically predicting cluster ambiguity in monochrome scatterplots.
arXiv Detail & Related papers (2023-08-01T04:46:35Z)
Interpretable Deep Clustering for Tabular Data [7.972599673048582]
Clustering is a fundamental learning task widely used in data analysis. We propose a new deep-learning framework that predicts interpretable cluster assignments at the instance and cluster levels. We show that the proposed method can reliably predict cluster assignments in biological, text, image, and physics datasets.
arXiv Detail & Related papers (2023-06-07T21:08:09Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Linking data separation, visual separation, and classifier performance using pseudo-labeling by contrastive learning [125.99533416395765]
We argue that the performance of the final classifier depends on the data separation present in the latent space and visual separation present in the projection. We demonstrate our results by the classification of five real-world challenging image datasets of human intestinal parasites with only 1% supervised samples.
arXiv Detail & Related papers (2023-02-06T10:01:38Z)
Human Motion Detection Using Sharpened Dimensionality Reduction and Clustering [1.1172382217477126]
We propose clustering methods to easily label the 2D projections of high-dimensional data. We test our pipeline of SDR and the clustering methods on a range of synthetic and real-world datasets. We conclude that clustering SDR results yields better labeling results than clustering plain DR, and that k-means is the recommended clustering method for SDR.
arXiv Detail & Related papers (2022-02-23T18:18:25Z)
Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC) In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning. For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z)
Clustering Plotted Data by Image Segmentation [12.443102864446223]
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data. Our approach, Visual Clustering, has several advantages over traditional clustering algorithms.
arXiv Detail & Related papers (2021-10-06T06:19:30Z)
Learning Statistical Representation with Joint Deep Embedded Clustering [2.1267423178232407]
StatDEC is an unsupervised framework for joint statistical representation learning and clustering. Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets.
arXiv Detail & Related papers (2021-09-11T09:26:52Z)
Integrating Auxiliary Information in Self-supervised Learning [94.11964997622435]
We first observe that the auxiliary information may bring us useful information about data structures. We present to construct data clusters according to the auxiliary information. We show that Cl-InfoNCE may be a better approach to leverage the data clustering information.
arXiv Detail & Related papers (2021-06-05T11:01:15Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Unsupervised Visual Representation Learning by Online Constrained K-Means [44.38989920488318]
Cluster discrimination is an effective pretext task for unsupervised representation learning. We propose a novel clustering-based pretext task with online textbfConstrained textbfK-mtextbfeans (textbfCoKe) Our online assignment method has a theoretical guarantee to approach the global optimum.
arXiv Detail & Related papers (2021-05-24T20:38:32Z)
Predictive K-means with local models [0.028675177318965035]
Predictive clustering seeks to obtain the best of the two worlds. We present two new algorithms using this technique and show on a variety of data sets that they are competitive for prediction performance.
arXiv Detail & Related papers (2020-12-16T10:49:36Z)
Dynamic Clustering in Federated Learning [15.37652170495055]
We propose a three-phased data clustering algorithm, namely: generative adversarial network-based clustering, cluster calibration, and cluster division. Our algorithm improves the performance of forecasting models, including cellular network handover, by 43%.
arXiv Detail & Related papers (2020-12-07T15:30:07Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset. Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation. We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.