Massively-Parallel Heat Map Sorting and Applications To Explainable
Clustering
- URL: http://arxiv.org/abs/2309.07486v1
- Date: Thu, 14 Sep 2023 07:53:52 GMT
- Title: Massively-Parallel Heat Map Sorting and Applications To Explainable
Clustering
- Authors: Sepideh Aghamolaei and Mohammad Ghodsi
- Abstract summary: We introduce the heat map sorting problem as reordering and merging the points and dimensions while preserving the clusters (labels)
A cluster is preserved if it remains connected, i.e., if it is not split into several clusters and no two clusters are merged.
We give an approximation algorithm for a NP-hard special case of the problem.
- Score: 1.544681800932596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a set of points labeled with $k$ labels, we introduce the heat map
sorting problem as reordering and merging the points and dimensions while
preserving the clusters (labels). A cluster is preserved if it remains
connected, i.e., if it is not split into several clusters and no two clusters
are merged.
We prove the problem is NP-hard and we give a fixed-parameter algorithm with
a constant number of rounds in the massively parallel computation model, where
each machine has a sublinear memory and the total memory of the machines is
linear. We give an approximation algorithm for a NP-hard special case of the
problem. We empirically compare our algorithm with k-means and density-based
clustering (DBSCAN) using a dimensionality reduction via locality-sensitive
hashing on several directed and undirected graphs of email and computer
networks.
Related papers
- An SDP-based Branch-and-Cut Algorithm for Biclustering [0.0]
We present a tailored branch-and-cut algorithm for biclustering problems.
We show that the proposed algorithm can solve instances 20 times larger than those handled by general-purpose solvers.
arXiv Detail & Related papers (2024-03-17T21:43:19Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - A Dynamical Systems Algorithm for Clustering in Hyperspectral Imagery [0.18374319565577152]
We present a new dynamical systems algorithm for clustering in hyperspectral images.
The main idea of the algorithm is that data points are pushed' in the direction of increasing density and groups of pixels that end up in the same dense regions belong to the same class.
We evaluate the algorithm on the Urban scene comparing performance against the k-means algorithm using pre-identified classes of materials as ground truth.
arXiv Detail & Related papers (2022-07-21T17:31:57Z) - Correlation Clustering in Constant Many Parallel Rounds [42.10280805559555]
Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining.
In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work.
Our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds.
arXiv Detail & Related papers (2021-06-15T21:45:45Z) - ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering
using Nearest-Neighbor Chain [6.824747267214373]
We propose the ParChain framework for designing parallel hierarchical agglomerative clustering (HAC) algorithms.
Compared to most previous parallel HAC algorithms, our new algorithms require only linear memory, and are scalable to large data sets.
Our algorithms are able to scale to data set sizes with tens of millions of points, which existing algorithms are not able to handle.
arXiv Detail & Related papers (2021-06-08T23:13:27Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Clustering by Constructing Hyper-Planes [0.0]
We present a clustering algorithm by finding hyper-planes to distinguish data points.
It relies on the marginal space between the points to determine centers and numbers of clusters.
Because the algorithm is based on linear structures, it can approximate the distribution of datasets accurately and flexibly.
arXiv Detail & Related papers (2020-04-25T08:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.