Cluster Explanation via Polyhedral Descriptions
- URL: http://arxiv.org/abs/2210.08798v1
- Date: Mon, 17 Oct 2022 07:26:44 GMT
- Title: Cluster Explanation via Polyhedral Descriptions
- Authors: Connor Lawless, Oktay Gunluk
- Abstract summary: Clustering is an unsupervised learning problem that aims to partition unlabelled data points into groups with similar features.
Traditional clustering algorithms provide limited insight into the groups they find as their main focus is accuracy and not the interpretability of the group assignments.
We introduce a new approach to explain clusters by constructing polyhedra around each cluster while minimizing either the complexity of the resulting polyhedra or the number of features used in the description.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering is an unsupervised learning problem that aims to partition
unlabelled data points into groups with similar features. Traditional
clustering algorithms provide limited insight into the groups they find as
their main focus is accuracy and not the interpretability of the group
assignments. This has spurred a recent line of work on explainable machine
learning for clustering. In this paper we focus on the cluster description
problem where, given a dataset and its partition into clusters, the task is to
explain the clusters. We introduce a new approach to explain clusters by
constructing polyhedra around each cluster while minimizing either the
complexity of the resulting polyhedra or the number of features used in the
description. We formulate the cluster description problem as an integer program
and present a column generation approach to search over an exponential number
of candidate half-spaces that can be used to build the polyhedra. To deal with
large datasets, we introduce a novel grouping scheme that first forms smaller
groups of data points and then builds the polyhedra around the grouped data, a
strategy which out-performs simply sub-sampling data. Compared to state of the
art cluster description algorithms, our approach is able to achieve competitive
interpretability with improved description accuracy.
Related papers
- Fair Minimum Representation Clustering via Integer Programming [0.6906005491572401]
Clustering is an unsupervised learning task that aims to partition data into a set of clusters.
In this paper, we study the k-means and k-medians clustering problems with the additional constraint that each group must have a minimum level of representation.
We present an alternating minimization algorithm, called MiniReL, that directly incorporates the fairness constraints.
arXiv Detail & Related papers (2024-09-04T00:13:40Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Using Decision Trees for Interpretable Supervised Clustering [0.0]
supervised clustering aims at forming clusters of labelled data with high probability densities.
We are particularly interested in finding clusters of data of a given class and describing the clusters with the set of comprehensive rules.
arXiv Detail & Related papers (2023-07-16T17:12:45Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep
Neural Networks [53.88811980967342]
This paper presents a Deep Clustering via Ensembles (DeepCluE) approach.
It bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks.
Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.
arXiv Detail & Related papers (2022-06-01T09:51:38Z) - Fast and explainable clustering based on sorting [0.0]
We introduce a fast and explainable clustering method called CLASSIX.
The algorithm is controlled by two scalar parameters, namely a distance parameter for the aggregation and another parameter controlling the minimal cluster size.
Our experiments demonstrate that CLASSIX competes with state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2022-02-03T08:24:21Z) - Interpretable Clustering via Multi-Polytope Machines [12.69310440882225]
We propose a novel approach for interpretable clustering that both clusters data points and constructs polytopes around the discovered clusters to explain them.
We benchmark our approach on a suite of synthetic and real world clustering problems, where our algorithm outperforms state of the art interpretable and non-interpretable clustering algorithms.
arXiv Detail & Related papers (2021-12-10T16:36:32Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - Deep Descriptive Clustering [24.237000220172906]
This paper explores a novel setting for performing clustering on complex data while simultaneously generating explanations using interpretable tags.
We form good clusters by maximizing the mutual information between empirical distribution on the inputs and the induced clustering labels for clustering objectives.
Experimental results on public data demonstrate that our model outperforms competitive baselines in clustering performance.
arXiv Detail & Related papers (2021-05-24T21:40:16Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.