Towards Practical Explainability with Cluster Descriptors
- URL: http://arxiv.org/abs/2210.10662v2
- Date: Thu, 20 Oct 2022 16:41:53 GMT
- Title: Towards Practical Explainability with Cluster Descriptors
- Authors: Xiaoyuan Liu, Ilya Tyagin, Hayato Ushijima-Mwesigwa, Indradeep Ghosh,
Ilya Safro
- Abstract summary: We study the problem of making the clusters more explainable by investigating the cluster descriptors.
The goal is to find a representative set of tags for each cluster, referred to as the cluster descriptors, with the constraint that these descriptors are pairwise disjoint.
We propose a novel explainability model that reinforces the previous models in such a way that tags that do not contribute to explainability and do not sufficiently distinguish between clusters are not added to the optimal descriptors.
- Score: 3.899688920770429
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid development of machine learning, improving its explainability
has become a crucial research goal. We study the problem of making the clusters
more explainable by investigating the cluster descriptors. Given a set of
objects $S$, a clustering of these objects $\pi$, and a set of tags $T$ that
have not participated in the clustering algorithm. Each object in $S$ is
associated with a subset of $T$. The goal is to find a representative set of
tags for each cluster, referred to as the cluster descriptors, with the
constraint that these descriptors we find are pairwise disjoint, and the total
size of all the descriptors is minimized. In general, this problem is NP-hard.
We propose a novel explainability model that reinforces the previous models in
such a way that tags that do not contribute to explainability and do not
sufficiently distinguish between clusters are not added to the optimal
descriptors. The proposed model is formulated as a quadratic unconstrained
binary optimization problem which makes it suitable for solving on modern
optimization hardware accelerators. We experimentally demonstrate how a
proposed explainability model can be solved on specialized hardware for
accelerating combinatorial optimization, the Fujitsu Digital Annealer, and use
real-life Twitter and PubMed datasets for use cases.
Related papers
- Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - How to Find a Good Explanation for Clustering? [7.951746797489421]
Moshkovitz, Dasgupta, Rashtchian, and Frost [ICML 2020] proposed an elegant model of explainable $k$-means and $k$-median clustering.
We study two natural algorithmic questions about explainable clustering.
Our rigorous algorithmic analysis sheds some light on the influence of parameters like the input size, dimension of the data, the number of outliers, the number of clusters, and the approximation ratio, on the computational complexity of explainable clustering.
arXiv Detail & Related papers (2021-12-13T11:48:38Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Deep Descriptive Clustering [24.237000220172906]
This paper explores a novel setting for performing clustering on complex data while simultaneously generating explanations using interpretable tags.
We form good clusters by maximizing the mutual information between empirical distribution on the inputs and the induced clustering labels for clustering objectives.
Experimental results on public data demonstrate that our model outperforms competitive baselines in clustering performance.
arXiv Detail & Related papers (2021-05-24T21:40:16Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Exact Recovery of Mangled Clusters with Same-Cluster Queries [20.03712152278538]
We study the cluster recovery problem in the semi-supervised active clustering framework.
We design an algorithm that, given $n$ points to be partitioned into $k$ clusters, uses $O(k3 ln k ln n)$ oracle queries and $tildeO(kn + k3)$ time to recover the clustering with zero misclassification error.
arXiv Detail & Related papers (2020-06-08T15:27:58Z) - Unsupervised Person Re-identification via Softened Similarity Learning [122.70472387837542]
Person re-identification (re-ID) is an important topic in computer vision.
This paper studies the unsupervised setting of re-ID, which does not require any labeled information.
Experiments on two image-based and video-based datasets demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2020-04-07T17:16:41Z) - Efficient Algorithms for Generating Provably Near-Optimal Cluster
Descriptors for Explainability [36.11663695534294]
We study the problem of making clusters more interpretable by extending a recent approach of constructing succinct representations for clusters.
We develop approximation algorithms with provable performance guarantees for the problem.
We also show applications to explain clusters from datasets, including clusters of genomic sequences that represent different threat levels.
arXiv Detail & Related papers (2020-02-06T19:49:54Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.