The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering
Algorithms
- URL: http://arxiv.org/abs/2010.05473v1
- Date: Mon, 12 Oct 2020 06:18:38 GMT
- Title: The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering
Algorithms
- Authors: Xin Han, Ye Zhu, Kai Ming Ting, Gang Li
- Abstract summary: Agglomerative hierarchical clustering (AHC) is one of the popular clustering approaches.
Existing AHC methods, which are based on a distance measure, have difficulty in identifying adjacent clusters with varied densities.
We show that the use of a data-dependent kernel (instead of distance or existing kernel) provides an effective means to address it.
- Score: 12.363083467305787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agglomerative hierarchical clustering (AHC) is one of the popular clustering
approaches. Existing AHC methods, which are based on a distance measure, have
one key issue: it has difficulty in identifying adjacent clusters with varied
densities, regardless of the cluster extraction methods applied on the
resultant dendrogram. In this paper, we identify the root cause of this issue
and show that the use of a data-dependent kernel (instead of distance or
existing kernel) provides an effective means to address it. We analyse the
condition under which existing AHC methods fail to extract clusters
effectively; and the reason why the data-dependent kernel is an effective
remedy. This leads to a new approach to kernerlise existing hierarchical
clustering algorithms such as existing traditional AHC algorithms, HDBSCAN, GDL
and PHA. In each of these algorithms, our empirical evaluation shows that a
recently introduced Isolation Kernel produces a higher quality or purer
dendrogram than distance, Gaussian Kernel and adaptive Gaussian Kernel.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Kernel Correlation-Dissimilarity for Multiple Kernel k-Means Clustering [21.685153346752124]
Current methods enhance information diversity and reduce redundancy by exploiting interdependencies among multiple kernels based on correlations or dissimilarities.
We introduce a novel method that systematically integrates both kernel correlation and dissimilarity.
By emphasizing the coherence between kernel correlation and dissimilarity, our method offers a more objective and transparent strategy for extracting non-linear information.
arXiv Detail & Related papers (2024-03-06T04:24:43Z) - Linear time Evidence Accumulation Clustering with KMeans [0.0]
This work describes a trick which mimic the behavior of average linkage clustering.
We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity.
The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
arXiv Detail & Related papers (2023-11-15T14:12:59Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Local Sample-weighted Multiple Kernel Clustering with Consensus
Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels.
This paper proposes a novel local sample-weighted multiple kernel clustering model.
Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Kernel k-Means, By All Means: Algorithms and Strong Consistency [21.013169939337583]
Kernel $k$ clustering is a powerful tool for unsupervised learning of non-linear data.
In this paper, we generalize results leveraging a general family of means to combat sub-optimal local solutions.
Our algorithm makes use of majorization-minimization (MM) to better solve this non-linear separation problem.
arXiv Detail & Related papers (2020-11-12T16:07:18Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Isolation Distributional Kernel: A New Tool for Point & Group Anomaly
Detection [76.1522587605852]
Isolation Distributional Kernel (IDK) is a new way to measure the similarity between two distributions.
We demonstrate IDK's efficacy and efficiency as a new tool for kernel based anomaly detection for both point and group anomalies.
arXiv Detail & Related papers (2020-09-24T12:25:43Z) - Fast Kernel k-means Clustering Using Incomplete Cholesky Factorization [11.631064399465089]
Kernel-based clustering algorithm can identify and capture the non-linear structure in datasets.
It can achieve better performance than linear clustering.
computing and storing the entire kernel matrix occupy so large memory that it is difficult for kernel-based clustering to deal with large-scale datasets.
arXiv Detail & Related papers (2020-02-07T15:32:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.