Point-Set Kernel Clustering
- URL: http://arxiv.org/abs/2002.05815v2
- Date: Thu, 6 Jan 2022 05:41:08 GMT
- Title: Point-Set Kernel Clustering
- Authors: Kai Ming Ting, Jonathan R. Wells and Ye Zhu
- Abstract summary: This paper introduces a new similarity measure called point-set kernel which computes the similarity between an object and a set of objects.
We show that the new clustering procedure is both effective and efficient that enables it to deal with large scale datasets.
- Score: 11.093960688450602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Measuring similarity between two objects is the core operation in existing
clustering algorithms in grouping similar objects into clusters. This paper
introduces a new similarity measure called point-set kernel which computes the
similarity between an object and a set of objects. The proposed clustering
procedure utilizes this new measure to characterize every cluster grown from a
seed object. We show that the new clustering procedure is both effective and
efficient that enables it to deal with large scale datasets. In contrast,
existing clustering algorithms are either efficient or effective. In comparison
with the state-of-the-art density-peak clustering and scalable kernel k-means
clustering, we show that the proposed algorithm is more effective and runs
orders of magnitude faster when applying to datasets of millions of data
points, on a commonly used computing machine.
Related papers
- GBCT: An Efficient and Adaptive Granular-Ball Clustering Algorithm for Complex Data [49.56145012222276]
We propose a new clustering algorithm called granular-ball clustering (GBCT) via granular-ball computing.
GBCT forms clusters according to the relationship between granular-balls, instead of the traditional point relationship.
As granular-balls can fit various complex data, GBCT performs much better in non-spherical data sets than other traditional clustering methods.
arXiv Detail & Related papers (2024-10-17T07:32:05Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Convex Clustering through MM: An Efficient Algorithm to Perform
Hierarchical Clustering [1.0589208420411012]
We propose convex clustering through majorization-minimization ( CCMM) -- an iterative algorithm that uses cluster fusions and a highly efficient updating scheme.
With a current desktop computer, CCMM efficiently solves convex clustering problems featuring over one million objects in seven-dimensional space.
arXiv Detail & Related papers (2022-11-03T15:07:51Z) - A sampling-based approach for efficient clustering in large datasets [0.8952229340927184]
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters.
Our contribution is substantially more efficient than k-means as it does not require an all to all comparison of data points and clusters.
arXiv Detail & Related papers (2021-12-29T19:15:20Z) - Clustering Plotted Data by Image Segmentation [12.443102864446223]
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data.
In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data.
Our approach, Visual Clustering, has several advantages over traditional clustering algorithms.
arXiv Detail & Related papers (2021-10-06T06:19:30Z) - Very Compact Clusters with Structural Regularization via Similarity and
Connectivity [3.779514860341336]
We propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC) for the general datasets.
Our proposed approach achieves better clustering performance over most of the state-of-the-art clustering methods.
arXiv Detail & Related papers (2021-06-09T23:22:03Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Probabilistic Partitive Partitioning (PPP) [0.0]
Clustering algorithms, in general, face two common problems.
They converge to different settings with different initial conditions.
The number of clusters has to be arbitrarily decided beforehand.
arXiv Detail & Related papers (2020-03-09T19:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.