Related papers: Tree Index: A New Cluster Evaluation Technique

Tree Index: A New Cluster Evaluation Technique

URL: http://arxiv.org/abs/2003.10841v1
Date: Tue, 24 Mar 2020 13:41:12 GMT
Title: Tree Index: A New Cluster Evaluation Technique
Authors: A. H. Beg, Md Zahidul Islam, Vladimir Estivill-Castro
Abstract summary: We introduce a cluster evaluation technique called Tree Index. Our Tree Index is finding margins amongst clusters for easy learning without the complications of Minimum Description Length. We show that, on the clustering results (obtained by various techniques) on a brain dataset, Tree Index discriminates between reasonable and non-sensible clusters.
Score: 2.790947019327459
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation power of clustering is some cumulative error similar to vector quantization). Our Tree Index is finding margins amongst clusters for easy learning without the complications of Minimum Description Length. Our Tree Index produces a decision tree from the clustered data set, using the cluster identifiers as labels. It combines the entropy of each leaf with their depth. Intuitively, a shorter tree with pure leaves generalizes the data well (the clusters are easy to learn because they are well separated). So, the labels are meaningful clusters. If the clustering algorithm does not separate well, trees learned from their results will be large and too detailed. We show that, on the clustering results (obtained by various techniques) on a brain dataset, Tree Index discriminates between reasonable and non-sensible clusters. We confirm the effectiveness of Tree Index through graphical visualizations. Tree Index evaluates the sensible solutions higher than the non-sensible solutions while existing cluster-quality indexes fail to do so.

Related papers

Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs [51.13363550716544]
Deep graph clustering is an unsupervised task aimed at partitioning nodes with incomplete attributes into distinct clusters.<n>Existing imputation methods for attribute-missing graphs often fail to account for the varying amounts of information available across node neighborhoods.<n>We propose Divide-Then-Rule Graph Completion (DTRGC) to address this issue.
arXiv Detail & Related papers (2025-07-12T03:33:19Z)
IsoSEL: Isometric Structural Entropy Learning for Deep Graph Clustering in Hyperbolic Space [57.036143666293334]
Graph clustering is a longstanding topic in machine learning. In this paper, we study a challenging yet practical problem: deep graph clustering without K considering the imbalance in reality. We present a novel IsoSEL framework for deep graph clustering, where we design a hyperbolic neural network to learn partitioning tree in the Lorentz model of hyperbolic space.
arXiv Detail & Related papers (2025-04-14T08:21:41Z)
ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items. The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z)
Kernel KMeans clustering splits for end-to-end unsupervised decision trees [16.539007424389254]
We present a novel end-to-end trained unsupervised binary tree for clustering: Kauri. This method performs a greedy maximisation of the kernel KMeans objective without requiring the definition of centroids. For other kernels, Kauri often outperforms the concatenation of kernel KMeans and a CART decision tree.
arXiv Detail & Related papers (2024-02-19T15:39:39Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z)
End-to-End Learning to Index and Search in Large Output Spaces [95.16066833532396]
Extreme multi-label classification (XMC) is a popular framework for solving real-world problems. In this paper, we propose a novel method which relaxes the tree-based index to a specialized weighted graph-based index. ELIAS achieves state-of-the-art performance on several large-scale extreme classification benchmarks with millions of labels.
arXiv Detail & Related papers (2022-10-16T01:34:17Z)
How to Find a Good Explanation for Clustering? [7.951746797489421]
Moshkovitz, Dasgupta, Rashtchian, and Frost [ICML 2020] proposed an elegant model of explainable $k$-means and $k$-median clustering. We study two natural algorithmic questions about explainable clustering. Our rigorous algorithmic analysis sheds some light on the influence of parameters like the input size, dimension of the data, the number of outliers, the number of clusters, and the approximation ratio, on the computational complexity of explainable clustering.
arXiv Detail & Related papers (2021-12-13T11:48:38Z)
Hierarchical clustering by aggregating representatives in sub-minimum-spanning-trees [5.877624540482919]
We propose a novel hierarchical clustering algorithm, in which, while building the clustering dendrogram, we can effectively detect the representative point. Under our analysis, the proposed algorithm has O(nlogn) time-complexity and O(logn) space-complexity, indicating that it has the scalability in handling massive data.
arXiv Detail & Related papers (2021-11-11T07:36:55Z)
Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC) In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning. For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z)
Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering. We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z)
BETULA: Numerically Stable CF-Trees for BIRCH Clustering [0.76146285961466]
BIRCH clustering is a widely known approach for clustering, that has influenced much subsequent research and commercial products. We introduce a replacement cluster feature that does not have the numeric problem, that is not much more expensive to maintain, and which makes many computations simpler and hence more efficient.
arXiv Detail & Related papers (2020-06-23T10:20:36Z)
Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner. We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost. We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.