Rethinking Divisive Hierarchical Clustering from a Distributional Perspective
- URL: http://arxiv.org/abs/2601.19718v1
- Date: Tue, 27 Jan 2026 15:41:56 GMT
- Title: Rethinking Divisive Hierarchical Clustering from a Distributional Perspective
- Authors: Kaifeng Zhang, Kai Ming Ting, Tianrun Liang, Qiuran Zhao,
- Abstract summary: Divisive Hierarchical Clustering (DHC) methods produce a dendrogram that does not have three desired properties.<n>We show that this shortcoming can be addressed by using a distributional kernel, instead of the set-oriented criterion.<n>Our proposed method successfully creates a dendrogram that is consistent with the biological regions in a Spatial Transcriptomics dataset.
- Score: 7.023830532843621
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We uncover that current objective-based Divisive Hierarchical Clustering (DHC) methods produce a dendrogram that does not have three desired properties i.e., no unwarranted splitting, group similar clusters into a same subset, ground-truth correspondence. This shortcoming has their root cause in using a set-oriented bisecting assessment criterion. We show that this shortcoming can be addressed by using a distributional kernel, instead of the set-oriented criterion; and the resultant clusters achieve a new distribution-oriented objective to maximize the total similarity of all clusters (TSC). Our theoretical analysis shows that the resultant dendrogram guarantees a lower bound of TSC. The empirical evaluation shows the effectiveness of our proposed method on artificial and Spatial Transcriptomics (bioinformatics) datasets. Our proposed method successfully creates a dendrogram that is consistent with the biological regions in a Spatial Transcriptomics dataset, whereas other contenders fail.
Related papers
- Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery [5.669361767058639]
Estimating heterogeneous treatment effects is critical in domains such as personalized medicine, resource allocation, and policy evaluation.<n>We propose a novel framework that clusters individuals based on estimated treatment effects using a learned kernel derived from causal forests.
arXiv Detail & Related papers (2025-09-06T17:01:23Z) - Hierarchical and Density-based Causal Clustering [6.082022112101251]
We propose plug-in estimators that are simple and readily implementable using off-the-shelf algorithms.
We go on to study their rate of convergence, and show that the additional cost of causal clustering is essentially the estimation error of the outcome regression functions.
arXiv Detail & Related papers (2024-11-02T14:01:04Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Contrastive Fine-grained Class Clustering via Generative Adversarial
Networks [9.667133604169829]
We introduce C3-GAN, a method that leverages the categorical inference power of InfoGAN by applying contrastive learning.
C3-GAN achieved state-of-the-art clustering performance on four fine-grained benchmark datasets.
arXiv Detail & Related papers (2021-12-30T08:57:11Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain
Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain.
We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one.
Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Learning Embeddings for Image Clustering: An Empirical Study of Triplet
Loss Approaches [10.42820615166362]
We evaluate two different image clustering objectives, k-means clustering and correlation clustering, in the context of Triplet Loss induced feature space embeddings.
We train a convolutional neural network to learn discriminative features by optimizing two popular versions of the Triplet Loss.
We propose a new, simple Triplet Loss formulation, which shows desirable properties with respect to formal clustering objectives and outperforms the existing methods.
arXiv Detail & Related papers (2020-07-06T23:38:14Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.