Hierarchical clustering with maximum density paths and mixture models
- URL: http://arxiv.org/abs/2503.15582v1
- Date: Wed, 19 Mar 2025 15:37:51 GMT
- Title: Hierarchical clustering with maximum density paths and mixture models
- Authors: Martin Ritzert, Polina Turishcheva, Laura Hansel, Paul Wollenhaupt, Marissa Weis, Alexander Ecker,
- Abstract summary: Hierarchical clustering is an effective and interpretable technique for analyzing structure in data.<n>It is particularly helpful in settings where the exact number of clusters is unknown, and provides a robust framework for exploring complex datasets.<n>Our method addresses this limitation by leveraging a two-stage approach, first employing a Gaussian or Student's t mixture model to overcluster the data, and then hierarchically merging clusters based on the induced density landscape.<n>This approach yields state-of-the-art clustering performance while also providing a meaningful hierarchy, making it a valuable tool for exploratory data analysis.
- Score: 39.42511559155036
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Hierarchical clustering is an effective and interpretable technique for analyzing structure in data, offering a nuanced understanding by revealing insights at multiple scales and resolutions. It is particularly helpful in settings where the exact number of clusters is unknown, and provides a robust framework for exploring complex datasets. Additionally, hierarchical clustering can uncover inner structures within clusters, capturing subtle relationships and nested patterns that may be obscured by traditional flat clustering methods. However, existing hierarchical clustering methods struggle with high-dimensional data, especially when there are no clear density gaps between modes. Our method addresses this limitation by leveraging a two-stage approach, first employing a Gaussian or Student's t mixture model to overcluster the data, and then hierarchically merging clusters based on the induced density landscape. This approach yields state-of-the-art clustering performance while also providing a meaningful hierarchy, making it a valuable tool for exploratory data analysis. Code is available at https://github.com/ecker-lab/tneb clustering.
Related papers
- Manifold Clustering with Schatten p-norm Maximization [16.90743611125625]
We develop a new clustering framework based on manifold clustering.
Specifically, the algorithm uses labels to guide the manifold structure and perform clustering on it.
In order to naturally maintain the class balance in the clustering process, we maximize the Schatten p-norm of labels.
arXiv Detail & Related papers (2025-04-29T03:23:06Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Contrastive Hierarchical Clustering [8.068701201341065]
CoHiClust is a Contrastive Hierarchical Clustering model based on deep neural networks.
By employing a self-supervised learning approach, CoHiClust distills the base network into a binary tree without access to any labeled data.
arXiv Detail & Related papers (2023-03-03T07:54:19Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Enhancing cluster analysis via topological manifold learning [0.3823356975862006]
We show that inferring the topological structure of a dataset before clustering can considerably enhance cluster detection.
We combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN.
arXiv Detail & Related papers (2022-07-01T15:53:39Z) - DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep
Neural Networks [53.88811980967342]
This paper presents a Deep Clustering via Ensembles (DeepCluE) approach.
It bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks.
Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.
arXiv Detail & Related papers (2022-06-01T09:51:38Z) - Very Compact Clusters with Structural Regularization via Similarity and
Connectivity [3.779514860341336]
We propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC) for the general datasets.
Our proposed approach achieves better clustering performance over most of the state-of-the-art clustering methods.
arXiv Detail & Related papers (2021-06-09T23:22:03Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.