Related papers: A Computational Theory and Semi-Supervised Algorithm for Clustering

A Computational Theory and Semi-Supervised Algorithm for Clustering

URL: http://arxiv.org/abs/2306.06974v1
Date: Mon, 12 Jun 2023 09:15:58 GMT
Title: A Computational Theory and Semi-Supervised Algorithm for Clustering
Authors: Nassir Mohammad
Abstract summary: A semi-supervised clustering algorithm is presented. The kernel of the clustering method is Mohammad's anomaly detection algorithm. Results are presented on synthetic and realworld data sets.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A computational theory for clustering and a semi-supervised clustering algorithm is presented. Clustering is defined to be the obtainment of groupings of data such that each group contains no anomalies with respect to a chosen grouping principle and measure; all other examples are considered to be fringe points, isolated anomalies, anomalous clusters or unknown clusters. More precisely, after appropriate modelling under the assumption of uniform random distribution, any example whose expectation of occurrence is <1 with respect to a group is considered an anomaly; otherwise it is assigned a membership of that group. Thus, clustering is conceived as the dual of anomaly detection. The representation of data is taken to be the Euclidean distance of a point to a cluster median. This is due to the robustness properties of the median to outliers, its approximate location of centrality and so that decision boundaries are general purpose. The kernel of the clustering method is Mohammad's anomaly detection algorithm, resulting in a parameter-free, fast, and efficient clustering algorithm. Acknowledging that clustering is an interactive and iterative process, the algorithm relies on a small fraction of known relationships between examples. These relationships serve as seeds to define the user's objectives and guide the clustering process. The algorithm then expands the clusters accordingly, leaving the remaining examples for exploration and subsequent iterations. Results are presented on synthetic and realworld data sets, demonstrating the advantages over the most widely used clustering methods.

Related papers

Counterfactual Explanations for k-means and Gaussian Clustering [1.8561812622368767]
We present a general definition for counterfactuals for model-based clustering that includes plausibility and feasibility constraints. Our approach takes as input the factual, the target cluster, a binary mask indicating actionable or immutable features and a plausibility factor specifying how far from the cluster boundary the counterfactual should be placed.
arXiv Detail & Related papers (2025-01-17T14:56:20Z)
UniForCE: The Unimodality Forest Method for Clustering and Estimation of the Number of Clusters [2.4953699842881605]
We focus on the concept of unimodality and propose a flexible cluster definition called locally unimodal cluster. A locally unimodal cluster extends for as long as unimodality is locally preserved across pairs of subclusters of the data. We propose the UniForCE method for locally unimodal clustering.
arXiv Detail & Related papers (2023-12-18T16:19:02Z)
Superclustering by finding statistically significant separable groups of optimal gaussian clusters [0.0]
The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion. An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer.
arXiv Detail & Related papers (2023-09-05T23:49:46Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality. The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z)
Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types. This is different from anomaly detection, whose goal is to divide anomalies from normal data. We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z)
Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning. Recent work has established lower bounds against the class of low-degree methods. We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z)
Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one. We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective. Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Revisiting Agglomerative Clustering [4.291340656866855]
A model of clusters was also adopted, involving a higher density nucleus surrounded by a transition, followed by outliers. The obtained results include the verification that many methods detect two clusters in unimodal data. The single-linkage method was found to be more resilient to false positives.
arXiv Detail & Related papers (2020-05-16T14:07:25Z)
Point-Set Kernel Clustering [11.093960688450602]
This paper introduces a new similarity measure called point-set kernel which computes the similarity between an object and a set of objects. We show that the new clustering procedure is both effective and efficient that enables it to deal with large scale datasets.
arXiv Detail & Related papers (2020-02-14T00:00:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.