A Computational Theory and Semi-Supervised Algorithm for Clustering
- URL: http://arxiv.org/abs/2306.06974v1
- Date: Mon, 12 Jun 2023 09:15:58 GMT
- Title: A Computational Theory and Semi-Supervised Algorithm for Clustering
- Authors: Nassir Mohammad
- Abstract summary: A semi-supervised clustering algorithm is presented.
The kernel of the clustering method is Mohammad's anomaly detection algorithm.
Results are presented on synthetic and realworld data sets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A computational theory for clustering and a semi-supervised clustering
algorithm is presented. Clustering is defined to be the obtainment of groupings
of data such that each group contains no anomalies with respect to a chosen
grouping principle and measure; all other examples are considered to be fringe
points, isolated anomalies, anomalous clusters or unknown clusters. More
precisely, after appropriate modelling under the assumption of uniform random
distribution, any example whose expectation of occurrence is <1 with respect to
a group is considered an anomaly; otherwise it is assigned a membership of that
group. Thus, clustering is conceived as the dual of anomaly detection. The
representation of data is taken to be the Euclidean distance of a point to a
cluster median. This is due to the robustness properties of the median to
outliers, its approximate location of centrality and so that decision
boundaries are general purpose. The kernel of the clustering method is
Mohammad's anomaly detection algorithm, resulting in a parameter-free, fast,
and efficient clustering algorithm. Acknowledging that clustering is an
interactive and iterative process, the algorithm relies on a small fraction of
known relationships between examples. These relationships serve as seeds to
define the user's objectives and guide the clustering process. The algorithm
then expands the clusters accordingly, leaving the remaining examples for
exploration and subsequent iterations. Results are presented on synthetic and
realworld data sets, demonstrating the advantages over the most widely used
clustering methods.
Related papers
- UniForCE: The Unimodality Forest Method for Clustering and Estimation of
the Number of Clusters [2.4953699842881605]
We focus on the concept of unimodality and propose a flexible cluster definition called locally unimodal cluster.
A locally unimodal cluster extends for as long as unimodality is locally preserved across pairs of subclusters of the data.
We propose the UniForCE method for locally unimodal clustering.
arXiv Detail & Related papers (2023-12-18T16:19:02Z) - Superclustering by finding statistically significant separable groups of
optimal gaussian clusters [0.0]
The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion.
An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer.
arXiv Detail & Related papers (2023-09-05T23:49:46Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly
Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types.
This is different from anomaly detection, whose goal is to divide anomalies from normal data.
We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Revisiting Agglomerative Clustering [4.291340656866855]
A model of clusters was also adopted, involving a higher density nucleus surrounded by a transition, followed by outliers.
The obtained results include the verification that many methods detect two clusters in unimodal data.
The single-linkage method was found to be more resilient to false positives.
arXiv Detail & Related papers (2020-05-16T14:07:25Z) - Point-Set Kernel Clustering [11.093960688450602]
This paper introduces a new similarity measure called point-set kernel which computes the similarity between an object and a set of objects.
We show that the new clustering procedure is both effective and efficient that enables it to deal with large scale datasets.
arXiv Detail & Related papers (2020-02-14T00:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.