Related papers: ClusterNet: A Perception-Based Clustering Model for Scattered Data

ClusterNet: A Perception-Based Clustering Model for Scattered Data

URL: http://arxiv.org/abs/2304.14185v3
Date: Wed, 6 Mar 2024 07:41:06 GMT
Title: ClusterNet: A Perception-Based Clustering Model for Scattered Data
Authors: Sebastian Hartwig, Christian van Onzenoodt, Dominik Engel, Pedro Hermosilla, Timo Ropinski
Abstract summary: Cluster separation in scatterplots is a task that is typically tackled by widely used clustering techniques. We propose a learning strategy which directly operates on scattered data. We train ClusterNet, a point-based deep learning model, trained to reflect human perception of cluster separability.
Score: 16.326062082938215
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visualizations for scattered data are used to make users understand certain attributes of their data by solving different tasks, e.g. correlation estimation, outlier detection, cluster separation. In this paper, we focus on the later task, and develop a technique that is aligned to human perception, that can be used to understand how human subjects perceive clusterings in scattered data and possibly optimize for better understanding. Cluster separation in scatterplots is a task that is typically tackled by widely used clustering techniques, such as for instance k-means or DBSCAN. However, as these algorithms are based on non-perceptual metrics, we can show in our experiments, that their output do not reflect human cluster perception. We propose a learning strategy which directly operates on scattered data. To learn perceptual cluster separation on this data, we crowdsourced a large scale dataset, consisting of 7,320 point-wise cluster affiliations for bivariate data, which has been labeled by 384 human crowd workers. Based on this data, we were able to train ClusterNet, a point-based deep learning model, trained to reflect human perception of cluster separability. In order to train ClusterNet on human annotated data, we use a PointNet++ architecture enabling inference on point clouds directly. In this work, we provide details on how we collected our dataset, report statistics of the resulting annotations, and investigate perceptual agreement of cluster separation for real-world data. We further report the training and evaluation protocol of ClusterNet and introduce a novel metric, that measures the accuracy between a clustering technique and a group of human annotators. Finally, we compare our approach against existing state-of-the-art clustering techniques and can show, that ClusterNet is able to generalize to unseen and out of scope data.

Related papers

Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning [53.527506374566485]
We propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN.<n>We show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
arXiv Detail & Related papers (2025-05-07T11:37:23Z)
THESAURUS: Contrastive Graph Clustering by Swapping Fused Gromov-Wasserstein Couplings [9.805171821491207]
We present conTrastive grapH clustEring by SwApping fUsed gRomov-wasserstein coUplingS (THESAURUS) Our method introduces semantic prototypes to provide contextual information, and employs a cross-view assignment prediction pretext task. It utilizes Gromov-Wasserstein Optimal Transport (GW-OT) along with the proposed prototype graph to thoroughly exploit cluster information in the graph structure.
arXiv Detail & Related papers (2024-12-16T08:33:56Z)
Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance. DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator. Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering [23.625877882403227]
We study perceptual variability in conducting visual clustering, which we call Cluster Ambiguity. We introduce CLAMS, a data-driven visual quality measure for automatically predicting cluster ambiguity in monochrome scatterplots.
arXiv Detail & Related papers (2023-08-01T04:46:35Z)
Interpretable Deep Clustering for Tabular Data [7.972599673048582]
Clustering is a fundamental learning task widely used in data analysis. We propose a new deep-learning framework that predicts interpretable cluster assignments at the instance and cluster levels. We show that the proposed method can reliably predict cluster assignments in biological, text, image, and physics datasets.
arXiv Detail & Related papers (2023-06-07T21:08:09Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Linking data separation, visual separation, and classifier performance using pseudo-labeling by contrastive learning [125.99533416395765]
We argue that the performance of the final classifier depends on the data separation present in the latent space and visual separation present in the projection. We demonstrate our results by the classification of five real-world challenging image datasets of human intestinal parasites with only 1% supervised samples.
arXiv Detail & Related papers (2023-02-06T10:01:38Z)
Human Motion Detection Using Sharpened Dimensionality Reduction and Clustering [1.1172382217477126]
We propose clustering methods to easily label the 2D projections of high-dimensional data. We test our pipeline of SDR and the clustering methods on a range of synthetic and real-world datasets. We conclude that clustering SDR results yields better labeling results than clustering plain DR, and that k-means is the recommended clustering method for SDR.
arXiv Detail & Related papers (2022-02-23T18:18:25Z)
Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC) In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning. For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z)
Clustering Plotted Data by Image Segmentation [12.443102864446223]
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data. Our approach, Visual Clustering, has several advantages over traditional clustering algorithms.
arXiv Detail & Related papers (2021-10-06T06:19:30Z)
Learning Statistical Representation with Joint Deep Embedded Clustering [2.1267423178232407]
StatDEC is an unsupervised framework for joint statistical representation learning and clustering. Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets.
arXiv Detail & Related papers (2021-09-11T09:26:52Z)
Integrating Auxiliary Information in Self-supervised Learning [94.11964997622435]
We first observe that the auxiliary information may bring us useful information about data structures. We present to construct data clusters according to the auxiliary information. We show that Cl-InfoNCE may be a better approach to leverage the data clustering information.
arXiv Detail & Related papers (2021-06-05T11:01:15Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Unsupervised Visual Representation Learning by Online Constrained K-Means [44.38989920488318]
Cluster discrimination is an effective pretext task for unsupervised representation learning. We propose a novel clustering-based pretext task with online textbfConstrained textbfK-mtextbfeans (textbfCoKe) Our online assignment method has a theoretical guarantee to approach the global optimum.
arXiv Detail & Related papers (2021-05-24T20:38:32Z)
Predictive K-means with local models [0.028675177318965035]
Predictive clustering seeks to obtain the best of the two worlds. We present two new algorithms using this technique and show on a variety of data sets that they are competitive for prediction performance.
arXiv Detail & Related papers (2020-12-16T10:49:36Z)
Dynamic Clustering in Federated Learning [15.37652170495055]
We propose a three-phased data clustering algorithm, namely: generative adversarial network-based clustering, cluster calibration, and cluster division. Our algorithm improves the performance of forecasting models, including cellular network handover, by 43%.
arXiv Detail & Related papers (2020-12-07T15:30:07Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset. Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation. We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.