ExKMC: Expanding Explainable $k$-Means Clustering
- URL: http://arxiv.org/abs/2006.02399v2
- Date: Thu, 2 Jul 2020 01:24:51 GMT
- Title: ExKMC: Expanding Explainable $k$-Means Clustering
- Authors: Nave Frost, Michal Moshkovitz, Cyrus Rashtchian
- Abstract summary: We study algorithms for $k$-means clustering, focusing on a trade-off between explainability and accuracy.
We develop a new explainable $k$-means clustering algorithm, ExKMC, that takes an additional parameter $k' geq k$ and outputs a decision tree with $k'$ leaves.
ExKMC produces a low cost clustering, outperforming both standard decision tree methods and other algorithms for explainable clustering.
- Score: 19.702066333512548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the popularity of explainable AI, there is limited work on effective
methods for unsupervised learning. We study algorithms for $k$-means
clustering, focusing on a trade-off between explainability and accuracy.
Following prior work, we use a small decision tree to partition a dataset into
$k$ clusters. This enables us to explain each cluster assignment by a short
sequence of single-feature thresholds. While larger trees produce more accurate
clusterings, they also require more complex explanations. To allow flexibility,
we develop a new explainable $k$-means clustering algorithm, ExKMC, that takes
an additional parameter $k' \geq k$ and outputs a decision tree with $k'$
leaves. We use a new surrogate cost to efficiently expand the tree and to label
the leaves with one of $k$ clusters. We prove that as $k'$ increases, the
surrogate cost is non-increasing, and hence, we trade explainability for
accuracy. Empirically, we validate that ExKMC produces a low cost clustering,
outperforming both standard decision tree methods and other algorithms for
explainable clustering. Implementation of ExKMC available at
https://github.com/navefr/ExKMC.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - GBMST: An Efficient Minimum Spanning Tree Clustering Based on
Granular-Ball Computing [78.92205914422925]
We propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST)
We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority"
Experimental results on several data sets demonstrate the power of the algorithm.
arXiv Detail & Related papers (2023-03-02T09:04:35Z) - How to Find a Good Explanation for Clustering? [7.951746797489421]
Moshkovitz, Dasgupta, Rashtchian, and Frost [ICML 2020] proposed an elegant model of explainable $k$-means and $k$-median clustering.
We study two natural algorithmic questions about explainable clustering.
Our rigorous algorithmic analysis sheds some light on the influence of parameters like the input size, dimension of the data, the number of outliers, the number of clusters, and the approximation ratio, on the computational complexity of explainable clustering.
arXiv Detail & Related papers (2021-12-13T11:48:38Z) - Explainable k-means. Don't be greedy, plant bigger trees! [12.68470213641421]
We provide a new bi-criteria $tildeO(log2 k)$ competitive algorithm for explainable $k$-means clustering.
Explainable $k$-means was recently introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020)
arXiv Detail & Related papers (2021-11-04T23:15:17Z) - Near-Optimal Explainable $k$-Means for All Dimensions [13.673697350508965]
We show an efficient algorithm that finds an explainable clustering whose $k$-means cost is at most $k1 - 2/dmathrmpoly(dlog k)$.
We get an improved bound of $k1 - 2/dmathrmpolylog(k)$, which we show is optimal for every choice of $k,dge 2$ up to a poly-logarithmic factor in $k$.
arXiv Detail & Related papers (2021-06-29T16:59:03Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - On the price of explainability for some clustering problems [1.52292571922932]
We provide upper and lower bounds for a natural model where explainability is achieved via decision trees.
For the $k$-means and $k$-medians problems our upper bounds improve those obtained by [Moshkovitz et. al, ICML 20] for low dimensions.
Another contribution is a simple and efficient algorithm for building explainable clusterings for the $k$-means problem.
arXiv Detail & Related papers (2021-01-05T15:08:25Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner.
We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost.
We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.