Related papers: The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering Algorithms

The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering Algorithms

URL: http://arxiv.org/abs/2010.05473v1
Date: Mon, 12 Oct 2020 06:18:38 GMT
Title: The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering Algorithms
Authors: Xin Han, Ye Zhu, Kai Ming Ting, Gang Li
Abstract summary: Agglomerative hierarchical clustering (AHC) is one of the popular clustering approaches. Existing AHC methods, which are based on a distance measure, have difficulty in identifying adjacent clusters with varied densities. We show that the use of a data-dependent kernel (instead of distance or existing kernel) provides an effective means to address it.
Score: 12.363083467305787
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Agglomerative hierarchical clustering (AHC) is one of the popular clustering approaches. Existing AHC methods, which are based on a distance measure, have one key issue: it has difficulty in identifying adjacent clusters with varied densities, regardless of the cluster extraction methods applied on the resultant dendrogram. In this paper, we identify the root cause of this issue and show that the use of a data-dependent kernel (instead of distance or existing kernel) provides an effective means to address it. We analyse the condition under which existing AHC methods fail to extract clusters effectively; and the reason why the data-dependent kernel is an effective remedy. This leads to a new approach to kernerlise existing hierarchical clustering algorithms such as existing traditional AHC algorithms, HDBSCAN, GDL and PHA. In each of these algorithms, our empirical evaluation shows that a recently introduced Isolation Kernel produces a higher quality or purer dendrogram than distance, Gaussian Kernel and adaptive Gaussian Kernel.

Related papers

Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z)
Kernel Correlation-Dissimilarity for Multiple Kernel k-Means Clustering [21.685153346752124]
Current methods enhance information diversity and reduce redundancy by exploiting interdependencies among multiple kernels based on correlations or dissimilarities. We introduce a novel method that systematically integrates both kernel correlation and dissimilarity. By emphasizing the coherence between kernel correlation and dissimilarity, our method offers a more objective and transparent strategy for extracting non-linear information.
arXiv Detail & Related papers (2024-03-06T04:24:43Z)
Linear time Evidence Accumulation Clustering with KMeans [0.0]
This work describes a trick which mimic the behavior of average linkage clustering. We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity. The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
arXiv Detail & Related papers (2023-11-15T14:12:59Z)
SDC-HSDD-NDSA: Structure Detecting Cluster by Hierarchical Secondary Directed Differential with Normalized Density and Self-Adaption [0.0]
Density-based clustering is the most popular clustering algorithm. It can identify clusters of arbitrary shape as long as they are separated by low-density regions. However, a high-density region that is not separated by low-density ones might also have different structures belonging to multiple clusters. In this paper, we provide a novel density-based clustering scheme to address this problem.
arXiv Detail & Related papers (2023-07-02T22:30:08Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Local Sample-weighted Multiple Kernel Clustering with Consensus Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels. This paper proposes a novel local sample-weighted multiple kernel clustering model. Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z)
Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks. Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z)
Kernel k-Means, By All Means: Algorithms and Strong Consistency [21.013169939337583]
Kernel $k$ clustering is a powerful tool for unsupervised learning of non-linear data. In this paper, we generalize results leveraging a general family of means to combat sub-optimal local solutions. Our algorithm makes use of majorization-minimization (MM) to better solve this non-linear separation problem.
arXiv Detail & Related papers (2020-11-12T16:07:18Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Isolation Distributional Kernel: A New Tool for Point & Group Anomaly Detection [76.1522587605852]
Isolation Distributional Kernel (IDK) is a new way to measure the similarity between two distributions. We demonstrate IDK's efficacy and efficiency as a new tool for kernel based anomaly detection for both point and group anomalies.
arXiv Detail & Related papers (2020-09-24T12:25:43Z)
Fast Kernel k-means Clustering Using Incomplete Cholesky Factorization [11.631064399465089]
Kernel-based clustering algorithm can identify and capture the non-linear structure in datasets. It can achieve better performance than linear clustering. computing and storing the entire kernel matrix occupy so large memory that it is difficult for kernel-based clustering to deal with large-scale datasets.
arXiv Detail & Related papers (2020-02-07T15:32:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.