Explainable Clustering via Exemplars: Complexity and Efficient
Approximation Algorithms
- URL: http://arxiv.org/abs/2209.09670v1
- Date: Tue, 20 Sep 2022 12:09:51 GMT
- Title: Explainable Clustering via Exemplars: Complexity and Efficient
Approximation Algorithms
- Authors: Ian Davidson, Michael Livanos, Antoine Gourru, Peter Walker, Julien
Velcin and S. S. Ravi
- Abstract summary: We propose an explainable-by-design clustering approach that finds clusters and exemplars to explain each cluster.
The use of exemplars for understanding is supported by the exemplar-based school of concept definition in psychology.
We show that finding a small set of exemplars to explain even a single cluster is computationally intractable.
- Score: 30.369731369945296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explainable AI (XAI) is an important developing area but remains relatively
understudied for clustering. We propose an explainable-by-design clustering
approach that not only finds clusters but also exemplars to explain each
cluster. The use of exemplars for understanding is supported by the
exemplar-based school of concept definition in psychology. We show that finding
a small set of exemplars to explain even a single cluster is computationally
intractable; hence, the overall problem is challenging. We develop an
approximation algorithm that provides provable performance guarantees with
respect to clustering quality as well as the number of exemplars used. This
basic algorithm explains all the instances in every cluster whilst another
approximation algorithm uses a bounded number of exemplars to allow simpler
explanations and provably covers a large fraction of all the instances.
Experimental results show that our work is useful in domains involving
difficult to understand deep embeddings of images and text.
Related papers
- Cluster Specific Representation Learning [1.6727186769396276]
Despite its widespread application, there is no established definition of a good'' representation.
We propose a downstream-agnostic formulation: when inherent clusters exist in the data, the representations should be specific to each cluster.
Under this idea, we develop a meta-algorithm that jointly learns cluster-specific representations and cluster assignments.
arXiv Detail & Related papers (2024-12-04T16:59:37Z) - Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective [52.662463893268225]
Self-supervised heterogeneous graph learning (SHGL) has shown promising potential in diverse scenarios.
Existing SHGL methods encounter two significant limitations.
We introduce a novel framework enhanced by rank and dual consistency constraints.
arXiv Detail & Related papers (2024-12-01T09:33:20Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - KnAC: an approach for enhancing cluster analysis with background
knowledge and explanations [0.20999222360659603]
We present Knowledge Augmented Clustering (KnAC), which main goal is to confront expert-based labelling with automated clustering.
KnAC can serve as an augmentation of an arbitrary clustering algorithm, making the approach robust and model-agnostic.
arXiv Detail & Related papers (2021-12-16T10:13:47Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Weighted Sparse Subspace Representation: A Unified Framework for
Subspace Clustering, Constrained Clustering, and Active Learning [0.3553493344868413]
We first propose a novel spectral-based subspace clustering algorithm that seeks to represent each point as a sparse convex combination of a few nearby points.
We then extend the algorithm to constrained clustering and active learning settings.
Our motivation for developing such a framework stems from the fact that typically either a small amount of labelled data is available in advance; or it is possible to label some points at a cost.
arXiv Detail & Related papers (2021-06-08T13:39:43Z) - Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method.
Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features.
On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Simple and Scalable Sparse k-means Clustering via Feature Ranking [14.839931533868176]
We propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms.
Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings.
arXiv Detail & Related papers (2020-02-20T02:41:02Z) - Efficient Algorithms for Generating Provably Near-Optimal Cluster
Descriptors for Explainability [36.11663695534294]
We study the problem of making clusters more interpretable by extending a recent approach of constructing succinct representations for clusters.
We develop approximation algorithms with provable performance guarantees for the problem.
We also show applications to explain clusters from datasets, including clusters of genomic sequences that represent different threat levels.
arXiv Detail & Related papers (2020-02-06T19:49:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.