Approximate spectral clustering with eigenvector selection and
self-tuned $k$
- URL: http://arxiv.org/abs/2302.11297v1
- Date: Wed, 22 Feb 2023 11:32:24 GMT
- Title: Approximate spectral clustering with eigenvector selection and
self-tuned $k$
- Authors: Mashaan Alshammari, Masahiro Takatsuka
- Abstract summary: Recently emerged spectral clustering surpasses conventional clustering methods by detecting clusters of any shape without the convexity assumption.
In practice, manual setting of $k$ could be subjective or time consuming.
The proposed algorithm has two relevance metrics for estimating $k$ in two vital steps of ASC.
The experimental setup demonstrates the efficiency of the proposed algorithm and its ability to compete with similar methods where $k$ was set manually.
- Score: 1.827510863075184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recently emerged spectral clustering surpasses conventional clustering
methods by detecting clusters of any shape without the convexity assumption.
Unfortunately, with a computational complexity of $O(n^3)$, it was infeasible
for multiple real applications, where $n$ could be large. This stimulates
researchers to propose the approximate spectral clustering (ASC). However, most
of ASC methods assumed that the number of clusters $k$ was known. In practice,
manual setting of $k$ could be subjective or time consuming. The proposed
algorithm has two relevance metrics for estimating $k$ in two vital steps of
ASC. One for selecting the eigenvectors spanning the embedding space, and the
other to discover the number of clusters in that space. The algorithm used a
growing neural gas (GNG) approximation, GNG is superior in preserving input
data topology. The experimental setup demonstrates the efficiency of the
proposed algorithm and its ability to compete with similar methods where $k$
was set manually.
Related papers
- Are Easy Data Easy (for K-Means) [0.0]
This paper investigates the capability of correctly recovering well-separated clusters by various brands of the $k$-means algorithm.
A new algorithm is proposed that is a variation of $k$-means++ via repeated subsampling when choosing a seed.
arXiv Detail & Related papers (2023-08-02T09:40:19Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Refining a $k$-nearest neighbor graph for a computationally efficient
spectral clustering [1.5469452301122175]
approximate spectral clustering (ASC) uses sampling or quantization to select data representatives.
We propose a refined version of $k$-nearest neighbor graph, in which we keep data points and aggressively reduce number of edges.
Compared to ASC methods, the proposed method delivered a consistent performance despite significant reduction of edges.
arXiv Detail & Related papers (2023-02-22T11:31:32Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Clustering with Penalty for Joint Occurrence of Objects: Computational
Aspects [0.0]
The method of Hol'y, Sokol and vCern'y clusters objects based on their incidence in a large number of given sets.
The idea is to minimize the occurrence of multiple objects from the same cluster in the same set.
In the current paper, we study computational aspects of the method.
arXiv Detail & Related papers (2021-02-02T10:39:27Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner.
We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost.
We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z) - Efficient Algorithms for Generating Provably Near-Optimal Cluster
Descriptors for Explainability [36.11663695534294]
We study the problem of making clusters more interpretable by extending a recent approach of constructing succinct representations for clusters.
We develop approximation algorithms with provable performance guarantees for the problem.
We also show applications to explain clusters from datasets, including clusters of genomic sequences that represent different threat levels.
arXiv Detail & Related papers (2020-02-06T19:49:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.