Flattening Multiparameter Hierarchical Clustering Functors
- URL: http://arxiv.org/abs/2104.14734v1
- Date: Fri, 30 Apr 2021 03:10:20 GMT
- Title: Flattening Multiparameter Hierarchical Clustering Functors
- Authors: Dan Shiebler
- Abstract summary: We introduce a procedure for flattening multiparameter hierarchical clusterings.
We then introduce a Bayesian update algorithm for learning clustering parameters from data.
We demonstrate that the composition of this algorithm with our flattening procedure satisfies a consistency property.
- Score: 1.52292571922932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We bring together topological data analysis, applied category theory, and
machine learning to study multiparameter hierarchical clustering. We begin by
introducing a procedure for flattening multiparameter hierarchical clusterings.
We demonstrate that this procedure is a functor from a category of
multiparameter hierarchical partitions to a category of binary integer
programs. We also include empirical results demonstrating its effectiveness.
Next, we introduce a Bayesian update algorithm for learning clustering
parameters from data. We demonstrate that the composition of this algorithm
with our flattening procedure satisfies a consistency property.
Related papers
- Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance.
We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features.
In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z) - Generating Hierarchical Structures for Improved Time Series
Classification Using Stochastic Splitting Functions [0.0]
This study introduces a novel hierarchical divisive clustering approach with splitting functions (SSFs) to enhance classification performance in multi-class datasets through hierarchical classification (HC)
The method has the unique capability of generating hierarchy without requiring explicit information, making it suitable for datasets lacking prior knowledge of hierarchy.
arXiv Detail & Related papers (2023-09-21T10:34:50Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Natural Hierarchical Cluster Analysis by Nearest Neighbors with
Near-Linear Time Complexity [0.0]
We propose a nearest neighbor based clustering algorithm that results in a naturally defined hierarchy of clusters.
In contrast to the agglomerative and divisive hierarchical clustering algorithms, our approach is not dependent on the iterative working of the algorithm.
arXiv Detail & Related papers (2022-03-15T16:03:42Z) - Fast and explainable clustering based on sorting [0.0]
We introduce a fast and explainable clustering method called CLASSIX.
The algorithm is controlled by two scalar parameters, namely a distance parameter for the aggregation and another parameter controlling the minimal cluster size.
Our experiments demonstrate that CLASSIX competes with state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2022-02-03T08:24:21Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Stable and consistent density-based clustering via multiparameter
persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis.
We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.
We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z) - Hierarchical Correlation Clustering and Tree Preserving Embedding [3.821323038670061]
We propose a hierarchical correlation clustering method that extends the well-known correlation clustering.
In the following, we study unsupervised representation learning with such hierarchical correlation clustering.
arXiv Detail & Related papers (2020-02-18T17:44:25Z) - Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.