Related papers: ExClus: Explainable Clustering on Low-dimensional Data Representations

ExClus: Explainable Clustering on Low-dimensional Data Representations

URL: http://arxiv.org/abs/2111.03168v1
Date: Thu, 4 Nov 2021 21:24:01 GMT
Title: ExClus: Explainable Clustering on Low-dimensional Data Representations
Authors: Xander Vankwikelberge, Bo Kang, Edith Heiter, Jefrey Lijffijt
Abstract summary: Dimensionality reduction and clustering techniques are frequently used to analyze complex data sets, but their results are often not easy to interpret. We consider how to support users in interpreting apparent cluster structure on scatter plots where the axes are not directly interpretable. We propose a new method to compute an interpretable clustering automatically, where the explanation is in the original high-dimensional space and the clustering is coherent in the low-dimensional projection.
Score: 9.496898312608307
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dimensionality reduction and clustering techniques are frequently used to analyze complex data sets, but their results are often not easy to interpret. We consider how to support users in interpreting apparent cluster structure on scatter plots where the axes are not directly interpretable, such as when the data is projected onto a two-dimensional space using a dimensionality-reduction method. Specifically, we propose a new method to compute an interpretable clustering automatically, where the explanation is in the original high-dimensional space and the clustering is coherent in the low-dimensional projection. It provides a tunable balance between the complexity and the amount of information provided, through the use of information theory. We study the computational complexity of this problem and introduce restrictions on the search space of solutions to arrive at an efficient, tunable, greedy optimization algorithm. This algorithm is furthermore implemented in an interactive tool called ExClus. Experiments on several data sets highlight that ExClus can provide informative and easy-to-understand patterns, and they expose where the algorithm is efficient and where there is room for improvement considering tunability and scalability.

Related papers

PretopoMD: Pretopology-based Mixed Data Hierarchical Clustering [0.0]
This article presents a novel pretopology-based algorithm designed to address the challenges of clustering mixed data without the need for dimensionality reduction.<n>Our approach formulates customizable logical rules and adjustable hyper parameters that allow for user-defined hierarchical cluster construction.<n> Empirical findings highlight the algorithm's robustness in constructing meaningful clusters and reveal its potential in overcoming issues related to clustered data explainability.
arXiv Detail & Related papers (2025-11-27T08:20:22Z)
Exact and Heuristic Algorithms for Constrained Biclustering [0.0]
Biclustering, also known as co-clustering or two-way clustering, simultaneously partitions the rows and columns of a data matrix to reveal submatrices with coherent patterns.<n>We study constrained biclustering with pairwise constraints, namely must-link and cannot-link constraints, which specify whether objects should belong to the same or different biclusters.
arXiv Detail & Related papers (2025-08-07T15:29:22Z)
Scalable Context-Preserving Model-Aware Deep Clustering for Hyperspectral Images [51.95768218975529]
Subspace clustering has become widely adopted for the unsupervised analysis of hyperspectral images (HSIs)<n>Recent model-aware deep subspace clustering methods often use a two-stage framework, involving the calculation of a self-representation matrix with complexity of O(n2), followed by spectral clustering.<n>We propose a scalable, context-preserving deep clustering method based on basis representation, which jointly captures local and non-local structures for efficient HSI clustering.
arXiv Detail & Related papers (2025-06-12T16:43:09Z)
InfoClus: Informative Clustering of High-dimensional Data Embeddings [3.2286304379514146]
We introduce a new concept named partitioning with explanations. The idea is to partition the data shown through the embedding into groups, each of which is given a sparse explanation. We show that InfoClus can automatically create good starting points for the analysis of dimensionality-reduction-based scatter plots.
arXiv Detail & Related papers (2025-04-15T11:34:03Z)
A Unifying Family of Data-Adaptive Partitioning Algorithms [0.0]
We present a family of data-adaptive partitioning algorithms that unifies several well-known methods. The algorithms are easy to use and interpret, and scale well to large, high-dimensional problems.
arXiv Detail & Related papers (2024-12-21T17:54:53Z)
Categorical Data Clustering via Value Order Estimated Distance Metric Learning [53.28598689867732]
This paper introduces a novel order distance metric learning approach to intuitively represent categorical attribute values.<n>A new joint learning paradigm is developed to alternatively perform clustering and order distance metric learning.<n>The proposed method achieves superior clustering accuracy on categorical and mixed datasets.
arXiv Detail & Related papers (2024-11-19T08:23:25Z)
Diffusion-based Semi-supervised Spectral Algorithm for Regression on Manifolds [2.0649432688817444]
We introduce a novel diffusion-based spectral algorithm to tackle regression analysis on high-dimensional data. Our method uses the local estimation property of heat kernel, offering an adaptive, data-driven approach to overcome this obstacle. Our algorithm performs in an entirely data-driven manner, operating directly within the intrinsic manifold structure of the data.
arXiv Detail & Related papers (2024-10-18T15:29:04Z)
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z)
Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation. The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation. In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space. A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z)
Learning A Disentangling Representation For PU Learning [18.94726971543125]
We propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two clusters. We conduct experiments on simulated PU data that demonstrate the improved performance of our proposed method compared to the current state-of-the-art approaches.
arXiv Detail & Related papers (2023-10-05T18:33:32Z)
Adaptively-weighted Integral Space for Fast Multiview Clustering [54.177846260063966]
We propose an Adaptively-weighted Integral Space for Fast Multiview Clustering (AIMC) with nearly linear complexity. Specifically, view generation models are designed to reconstruct the view observations from the latent integral space. Experiments conducted on several realworld datasets confirm the superiority of the proposed AIMC method.
arXiv Detail & Related papers (2022-08-25T05:47:39Z)
Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE. Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding. We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z)
Skew-Symmetric Adjacency Matrices for Clustering Directed Graphs [5.301300942803395]
Cut-based directed graph (digraph) clustering often focuses on finding dense within-cluster or sparse between-cluster connections. For flow-based clusterings the edges between clusters tend to be oriented in one direction and have been found in migration data, food webs, and trade data.
arXiv Detail & Related papers (2022-03-02T20:07:04Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem. The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces. To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
New advances in enumerative biclustering algorithms with online partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets. The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
Simple and Scalable Sparse k-means Clustering via Feature Ranking [14.839931533868176]
We propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings.
arXiv Detail & Related papers (2020-02-20T02:41:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.