Interactive Steering of Hierarchical Clustering
- URL: http://arxiv.org/abs/2009.09618v1
- Date: Mon, 21 Sep 2020 05:26:07 GMT
- Title: Interactive Steering of Hierarchical Clustering
- Authors: Weikai Yang, Xiting Wang, Jie Lu, Wenwen Dou, Shixia Liu
- Abstract summary: We present an interactive steering method to visually supervise constrained hierarchical clustering by utilizing both public knowledge (e.g., Wikipedia) and private knowledge from users.
The novelty of our approach includes 1) automatically constructing constraints for hierarchical clustering using knowledge (knowledge-driven) and intrinsic data distribution (data-driven)
To clearly convey the hierarchical clustering results, an uncertainty-aware tree visualization has been developed to enable users to quickly locate the most uncertain sub-hierarchies.
- Score: 30.371250297444703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical clustering is an important technique to organize big data for
exploratory data analysis. However, existing one-size-fits-all hierarchical
clustering methods often fail to meet the diverse needs of different users. To
address this challenge, we present an interactive steering method to visually
supervise constrained hierarchical clustering by utilizing both public
knowledge (e.g., Wikipedia) and private knowledge from users. The novelty of
our approach includes 1) automatically constructing constraints for
hierarchical clustering using knowledge (knowledge-driven) and intrinsic data
distribution (data-driven), and 2) enabling the interactive steering of
clustering through a visual interface (user-driven). Our method first maps each
data item to the most relevant items in a knowledge base. An initial constraint
tree is then extracted using the ant colony optimization algorithm. The
algorithm balances the tree width and depth and covers the data items with high
confidence. Given the constraint tree, the data items are hierarchically
clustered using evolutionary Bayesian rose tree. To clearly convey the
hierarchical clustering results, an uncertainty-aware tree visualization has
been developed to enable users to quickly locate the most uncertain
sub-hierarchies and interactively improve them. The quantitative evaluation and
case study demonstrate that the proposed approach facilitates the building of
customized clustering trees in an efficient and effective manner.
Related papers
- Order Is All You Need for Categorical Data Clustering [29.264630563297466]
Categorical data composed of nominal valued attributes are ubiquitous in knowledge discovery and data mining tasks.
Due to the lack of well-defined metric space, categorical data distributions are difficult to intuitively understand.
This paper introduces the new finding that the order relation among attribute values is the decisive factor in clustering accuracy.
arXiv Detail & Related papers (2024-11-19T08:23:25Z) - scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data [12.01555110624794]
scTree corrects for batch effects while simultaneously learning a tree-structured data representation.
We show empirically on seven datasets that scTree discovers the underlying clusters of the data.
arXiv Detail & Related papers (2024-06-27T16:16:55Z) - Cluster-based Graph Collaborative Filtering [55.929052969825825]
Graph Convolution Networks (GCNs) have succeeded in learning user and item representations for recommendation systems.
Most existing GCN-based methods overlook the multiple interests of users while performing high-order graph convolution.
We propose a novel GCN-based recommendation model, termed Cluster-based Graph Collaborative Filtering (ClusterGCF)
arXiv Detail & Related papers (2024-04-16T07:05:16Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Using Decision Trees for Interpretable Supervised Clustering [0.0]
supervised clustering aims at forming clusters of labelled data with high probability densities.
We are particularly interested in finding clusters of data of a given class and describing the clusters with the set of comprehensive rules.
arXiv Detail & Related papers (2023-07-16T17:12:45Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Hierarchical clustering by aggregating representatives in
sub-minimum-spanning-trees [5.877624540482919]
We propose a novel hierarchical clustering algorithm, in which, while building the clustering dendrogram, we can effectively detect the representative point.
Under our analysis, the proposed algorithm has O(nlogn) time-complexity and O(logn) space-complexity, indicating that it has the scalability in handling massive data.
arXiv Detail & Related papers (2021-11-11T07:36:55Z) - Path Based Hierarchical Clustering on Knowledge Graphs [1.713291434132985]
We present a novel approach for inducing a hierarchy of subject clusters.
Our method first constructs a tag hierarchy before assigning subjects to clusters on this hierarchy.
We quantitatively demonstrate our method's ability to induce a coherent cluster hierarchy on three real-world datasets.
arXiv Detail & Related papers (2021-09-27T16:42:43Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.