Rethinking k-means from manifold learning perspective
- URL: http://arxiv.org/abs/2305.07213v1
- Date: Fri, 12 May 2023 03:01:41 GMT
- Title: Rethinking k-means from manifold learning perspective
- Authors: Quanxue Gao, Qianqian Wang, Han Lu, Wei Xia, Xinbo Gao
- Abstract summary: We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
- Score: 122.38667613245151
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although numerous clustering algorithms have been developed, many existing
methods still leverage k-means technique to detect clusters of data points.
However, the performance of k-means heavily depends on the estimation of
centers of clusters, which is very difficult to achieve an optimal solution.
Another major drawback is that it is sensitive to noise and outlier data. In
this paper, from manifold learning perspective, we rethink k-means and present
a new clustering algorithm which directly detects clusters of data without mean
estimation. Specifically, we construct distance matrix between data points by
Butterworth filter such that distance between any two data points in the same
clusters equals to a small constant, while increasing the distance between
other data pairs from different clusters. To well exploit the complementary
information embedded in different views, we leverage the tensor Schatten p-norm
regularization on the 3rd-order tensor which consists of indicator matrices of
different views. Finally, an efficient alternating algorithm is derived to
optimize our model. The constructed sequence was proved to converge to the
stationary KKT point. Extensive experimental results indicate the superiority
of our proposed method.
Related papers
- K-Means Clustering With Incomplete Data with the Use of Mahalanobis Distances [0.0]
We develop a unified K-means algorithm that incorporates Mahalanobis distances, instead of the traditional Euclidean distances.
We demonstrate that our algorithm consistently outperforms both standalone imputation followed by K-means.
These results hold across both the IRIS dataset and randomly generated data with elliptical clusters.
arXiv Detail & Related papers (2024-10-31T00:05:09Z) - Inference with K-means [0.0]
k-means is an iterative clustering algorithm that randomly assigns k centroids, then assigns data points to the nearest centroid, and updates centroids based on the mean of assigned points.
Research investigates the prediction of the last component of data points obtained from a distribution of clustered data using the online balanced k-means approach.
arXiv Detail & Related papers (2024-10-04T06:51:58Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Linear time Evidence Accumulation Clustering with KMeans [0.0]
This work describes a trick which mimic the behavior of average linkage clustering.
We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity.
The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
arXiv Detail & Related papers (2023-11-15T14:12:59Z) - An enhanced method of initial cluster center selection for K-means
algorithm [0.0]
We propose a novel approach to improve initial cluster selection for K-means algorithm.
The Convex Hull algorithm facilitates the computing of the first two centroids and the remaining ones are selected according to the distance from previously selected centers.
We obtained only 7.33%, 7.90%, and 0% clustering error in Iris, Letter, and Ruspini data respectively.
arXiv Detail & Related papers (2022-10-18T00:58:50Z) - K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect
the Number of Clusters [0.12313056815753944]
This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters.
Accuracy and speed are two main advantages of the proposed method.
arXiv Detail & Related papers (2021-10-09T23:02:57Z) - Kernel distance measures for time series, random fields and other
structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data.
It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution.
Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z) - Robust Trimmed k-means [70.88503833248159]
We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points.
We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
arXiv Detail & Related papers (2021-08-16T15:49:40Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.