Tree Index: A New Cluster Evaluation Technique
- URL: http://arxiv.org/abs/2003.10841v1
- Date: Tue, 24 Mar 2020 13:41:12 GMT
- Title: Tree Index: A New Cluster Evaluation Technique
- Authors: A. H. Beg, Md Zahidul Islam, Vladimir Estivill-Castro
- Abstract summary: We introduce a cluster evaluation technique called Tree Index.
Our Tree Index is finding margins amongst clusters for easy learning without the complications of Minimum Description Length.
We show that, on the clustering results (obtained by various techniques) on a brain dataset, Tree Index discriminates between reasonable and non-sensible clusters.
- Score: 2.790947019327459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a cluster evaluation technique called Tree Index. Our Tree Index
algorithm aims at describing the structural information of the clustering
rather than the quantitative format of cluster-quality indexes (where the
representation power of clustering is some cumulative error similar to vector
quantization). Our Tree Index is finding margins amongst clusters for easy
learning without the complications of Minimum Description Length. Our Tree
Index produces a decision tree from the clustered data set, using the cluster
identifiers as labels. It combines the entropy of each leaf with their depth.
Intuitively, a shorter tree with pure leaves generalizes the data well (the
clusters are easy to learn because they are well separated). So, the labels are
meaningful clusters. If the clustering algorithm does not separate well, trees
learned from their results will be large and too detailed. We show that, on the
clustering results (obtained by various techniques) on a brain dataset, Tree
Index discriminates between reasonable and non-sensible clusters. We confirm
the effectiveness of Tree Index through graphical visualizations. Tree Index
evaluates the sensible solutions higher than the non-sensible solutions while
existing cluster-quality indexes fail to do so.
Related papers
- ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items.
The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z) - Kernel KMeans clustering splits for end-to-end unsupervised decision
trees [16.539007424389254]
We present a novel end-to-end trained unsupervised binary tree for clustering: Kauri.
This method performs a greedy maximisation of the kernel KMeans objective without requiring the definition of centroids.
For other kernels, Kauri often outperforms the concatenation of kernel KMeans and a CART decision tree.
arXiv Detail & Related papers (2024-02-19T15:39:39Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - End-to-End Learning to Index and Search in Large Output Spaces [95.16066833532396]
Extreme multi-label classification (XMC) is a popular framework for solving real-world problems.
In this paper, we propose a novel method which relaxes the tree-based index to a specialized weighted graph-based index.
ELIAS achieves state-of-the-art performance on several large-scale extreme classification benchmarks with millions of labels.
arXiv Detail & Related papers (2022-10-16T01:34:17Z) - How to Find a Good Explanation for Clustering? [7.951746797489421]
Moshkovitz, Dasgupta, Rashtchian, and Frost [ICML 2020] proposed an elegant model of explainable $k$-means and $k$-median clustering.
We study two natural algorithmic questions about explainable clustering.
Our rigorous algorithmic analysis sheds some light on the influence of parameters like the input size, dimension of the data, the number of outliers, the number of clusters, and the approximation ratio, on the computational complexity of explainable clustering.
arXiv Detail & Related papers (2021-12-13T11:48:38Z) - Hierarchical clustering by aggregating representatives in
sub-minimum-spanning-trees [5.877624540482919]
We propose a novel hierarchical clustering algorithm, in which, while building the clustering dendrogram, we can effectively detect the representative point.
Under our analysis, the proposed algorithm has O(nlogn) time-complexity and O(logn) space-complexity, indicating that it has the scalability in handling massive data.
arXiv Detail & Related papers (2021-11-11T07:36:55Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - BETULA: Numerically Stable CF-Trees for BIRCH Clustering [0.76146285961466]
BIRCH clustering is a widely known approach for clustering, that has influenced much subsequent research and commercial products.
We introduce a replacement cluster feature that does not have the numeric problem, that is not much more expensive to maintain, and which makes many computations simpler and hence more efficient.
arXiv Detail & Related papers (2020-06-23T10:20:36Z) - Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner.
We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost.
We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.