Algorithm-Agnostic Explainability for Unsupervised Clustering
- URL: http://arxiv.org/abs/2105.08053v1
- Date: Mon, 17 May 2021 17:58:55 GMT
- Title: Algorithm-Agnostic Explainability for Unsupervised Clustering
- Authors: Charles A. Ellis, Mohammad S.E. Sendi, Sergey M. Plis, Robyn L.
Miller, and Vince D. Calhoun
- Abstract summary: We present two novel algorithm-agnostic explainability methods, global permutation percent change (G2PC) feature importance and local perturbation percent change (L2PC) feature importance.
We demonstrate the utility of the methods for explaining five popular clustering algorithms on low-dimensional, ground-truth synthetic datasets.
- Score: 19.375627480270627
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Supervised machine learning explainability has greatly expanded in recent
years. However, the field of unsupervised clustering explainability has lagged
behind. Here, we, to the best of our knowledge, demonstrate for the first time
how model-agnostic methods for supervised machine learning explainability can
be adapted to provide algorithm-agnostic unsupervised clustering
explainability. We present two novel algorithm-agnostic explainability methods,
global permutation percent change (G2PC) feature importance and local
perturbation percent change (L2PC) feature importance, that can provide insight
into many clustering methods on a global level by identifying the relative
importance of features to a clustering algorithm and on a local level by
identifying the relative importance of features to the clustering of individual
samples. We demonstrate the utility of the methods for explaining five popular
clustering algorithms on low-dimensional, ground-truth synthetic datasets and
on high-dimensional functional network connectivity (FNC) data extracted from a
resting state functional magnetic resonance imaging (rs-fMRI) dataset of 151
subjects with schizophrenia (SZ) and 160 healthy controls (HC). Our proposed
explainability methods robustly identify the relative importance of features
across multiple clustering methods and could facilitate new insights into many
applications. We hope that this study will greatly accelerate the development
of the field of clustering explainability.
Related papers
- Counterfactual Explanations for Clustering Models [11.40145394568897]
Clustering algorithms rely on complex optimisation processes that may be difficult to comprehend.
We propose a new, model-agnostic technique for explaining clustering algorithms with counterfactual statements.
arXiv Detail & Related papers (2024-09-19T10:05:58Z) - NeurCAM: Interpretable Neural Clustering via Additive Models [3.4437947384641037]
Interpretable clustering algorithms aim to group similar data points while explaining the obtained groups.
We introduce the Neural Clustering Additive Model (NeurCAM), a novel approach to the interpretable clustering problem.
Our approach significantly outperforms other interpretable clustering approaches when clustering on text data.
arXiv Detail & Related papers (2024-08-23T20:32:57Z) - Multi-View Clustering via Semi-non-negative Tensor Factorization [120.87318230985653]
We develop a novel multi-view clustering based on semi-non-negative tensor factorization (Semi-NTF)
Our model directly considers the between-view relationship and exploits the between-view complementary information.
In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point.
arXiv Detail & Related papers (2023-03-29T14:54:19Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - A Modular Framework for Centrality and Clustering in Complex Networks [0.6423239719448168]
In this paper, we study two important such network analysis techniques, namely, centrality and clustering.
An information-flow based model is adopted for clustering, which itself builds upon an information theoretic measure for computing centrality.
Our clustering naturally inherits the flexibility to accommodate edge directionality, as well as different interpretations and interplay between edge weights and node degrees.
arXiv Detail & Related papers (2021-11-23T03:01:29Z) - Deep Attention-guided Graph Clustering with Dual Self-supervision [49.040136530379094]
We propose a novel method, namely deep attention-guided graph clustering with dual self-supervision (DAGC)
We develop a dual self-supervision solution consisting of a soft self-supervision strategy with a triplet Kullback-Leibler divergence loss and a hard self-supervision strategy with a pseudo supervision loss.
Our method consistently outperforms state-of-the-art methods on six benchmark datasets.
arXiv Detail & Related papers (2021-11-10T06:53:03Z) - Fast and Interpretable Consensus Clustering via Minipatch Learning [0.0]
We develop IMPACC: Interpretable MiniPatch Adaptive Consensus Clustering.
We develop adaptive sampling schemes for observations, which result in both improved reliability and computational savings.
Results show that our approach yields more accurate and interpretable cluster solutions.
arXiv Detail & Related papers (2021-10-05T22:39:28Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - A semi-supervised sparse K-Means algorithm [3.04585143845864]
An unsupervised sparse clustering method can be employed in order to detect the subgroup of features necessary for clustering.
A semi-supervised method can use the labelled data to create constraints and enhance the clustering solution.
We show that the algorithm maintains the high performance of other semi-supervised algorithms and in addition preserves the ability to identify informative from uninformative features.
arXiv Detail & Related papers (2020-03-16T02:05:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.