Revisiting Agglomerative Clustering
- URL: http://arxiv.org/abs/2005.07995v2
- Date: Fri, 26 Jun 2020 23:00:41 GMT
- Title: Revisiting Agglomerative Clustering
- Authors: Eric K. Tokuda, Cesar H. Comin and Luciano da F. Costa
- Abstract summary: A model of clusters was also adopted, involving a higher density nucleus surrounded by a transition, followed by outliers.
The obtained results include the verification that many methods detect two clusters in unimodal data.
The single-linkage method was found to be more resilient to false positives.
- Score: 4.291340656866855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An important issue in clustering concerns the avoidance of false positives
while searching for clusters. This work addressed this problem considering
agglomerative methods, namely single, average, median, complete, centroid and
Ward's approaches applied to unimodal and bimodal datasets obeying uniform,
gaussian, exponential and power-law distributions. A model of clusters was also
adopted, involving a higher density nucleus surrounded by a transition,
followed by outliers. This paved the way to defining an objective means for
identifying the clusters from dendrograms. The adopted model also allowed the
relevance of the clusters to be quantified in terms of the height of their
subtrees. The obtained results include the verification that many methods
detect two clusters in unimodal data. The single-linkage method was found to be
more resilient to false positives. Also, several methods detected clusters not
corresponding directly to the nucleus. The possibility of identifying the type
of distribution was also investigated.
Related papers
- Clustering Based on Density Propagation and Subcluster Merging [92.15924057172195]
We propose a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space.
Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process.
arXiv Detail & Related papers (2024-11-04T04:09:36Z) - A provable initialization and robust clustering method for general mixture models [6.806940901668607]
Clustering is a fundamental tool in statistical machine learning in the presence of heterogeneous data.
Most recent results focus on optimal mislabeling guarantees when data are distributed around centroids with sub-Gaussian errors.
arXiv Detail & Related papers (2024-01-10T22:56:44Z) - Robust and Automatic Data Clustering: Dirichlet Process meets
Median-of-Means [18.3248037914529]
We present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies.
Statistical guarantees on the upper bound of clustering error suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-11-26T19:01:15Z) - A Computational Theory and Semi-Supervised Algorithm for Clustering [0.0]
A semi-supervised clustering algorithm is presented.
The kernel of the clustering method is Mohammad's anomaly detection algorithm.
Results are presented on synthetic and realworld data sets.
arXiv Detail & Related papers (2023-06-12T09:15:58Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Self-Evolutionary Clustering [1.662966122370634]
Most existing deep clustering methods are based on simple distance comparison and highly dependent on the target distribution generated by a handcrafted nonlinear mapping.
A novel modular Self-Evolutionary Clustering (Self-EvoC) framework is constructed, which boosts the clustering performance by classification in a self-supervised manner.
The framework can efficiently discriminate sample outliers and generate better target distribution with the assistance of self-supervised.
arXiv Detail & Related papers (2022-02-21T19:38:18Z) - Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly
Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types.
This is different from anomaly detection, whose goal is to divide anomalies from normal data.
We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z) - Toward Generalized Clustering through an One-Dimensional Approach [0.8122270502556374]
An approach for detecting patches of separation between clusters is developed based on an agglomerative clustering, more specifically the single-linkage.
The potential of this method is illustrated with respect to the analyses of clusterless uniform and normal distributions of points, as well as a one-dimensional clustering model characterized by two intervals with high density of points separated by a less dense interstice.
arXiv Detail & Related papers (2020-01-01T16:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.