Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling
- URL: http://arxiv.org/abs/2007.11416v1
- Date: Tue, 21 Jul 2020 17:49:03 GMT
- Title: Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling
- Authors: Djallel Bouneffouf
- Abstract summary: This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a on when to use it.
Our datasets depends on the eigen spectrum shape of the dataset, and yields competitive low-rank approximations in test compared to the other state-of-the-art methods.
- Score: 19.675277307158435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spectral clustering has shown a superior performance in analyzing the cluster
structure. However, its computational complexity limits its application in
analyzing large-scale data. To address this problem, many low-rank matrix
approximating algorithms are proposed, including the Nystrom method - an
approach with proven approximate error bounds. There are several algorithms
that provide recipes to construct Nystrom approximations with variable
accuracies and computing times. This paper proposes a scalable Nystrom-based
clustering algorithm with a new sampling procedure, Centroid Minimum Sum of
Squared Similarities (CMS3), and a heuristic on when to use it. Our heuristic
depends on the eigen spectrum shape of the dataset, and yields competitive
low-rank approximations in test datasets compared to the other state-of-the-art
methods
Related papers
- Linear time Evidence Accumulation Clustering with KMeans [0.0]
This work describes a trick which mimic the behavior of average linkage clustering.
We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity.
The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
arXiv Detail & Related papers (2023-11-15T14:12:59Z) - Heteroskedastic Tensor Clustering [20.979358557219953]
We propose a two-stage method, named $mathsfHightext-orderHeteroClustering$ ($mathsfHHC$)
It starts by performing tensor subspace estimation via a novel spectral algorithm called $mathsfThresholdedDeflatedtext-HeteroPCA$, followed by approximate $k$-means to obtain cluster nodes.
Our algorithm provably achieves exact clustering as long as the SNR exceeds the computational limit.
arXiv Detail & Related papers (2023-11-04T02:50:40Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Fast conformational clustering of extensive molecular dynamics
simulation data [19.444636864515726]
We present an unsupervised data processing workflow that is specifically designed to obtain a fast conformational clustering of long trajectories.
We combine two dimensionality reduction algorithms (cc_analysis and encodermap) with a density-based spatial clustering algorithm (HDBSCAN)
With the help of four test systems we illustrate the capability and performance of this clustering workflow.
arXiv Detail & Related papers (2023-01-11T14:36:43Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Divide-and-conquer based Large-Scale Spectral Clustering [8.545202841051582]
We propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness.
The proposed method achieves lower computational complexity than most existing large-scale spectral clustering.
arXiv Detail & Related papers (2021-04-30T15:09:45Z) - Scalable Spectral Clustering with Nystrom Approximation: Practical and
Theoretical Aspects [1.6752182911522515]
This work presents a principled spectral clustering algorithm that exploits spectral properties of the similarity matrix associated with sampled points to regulate accuracy-efficiency trade-offs.
The overarching goal of this work is to provide an improved baseline for future research directions to accelerate spectral clustering.
arXiv Detail & Related papers (2020-06-25T15:10:56Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Local Graph Clustering with Network Lasso [90.66817876491052]
We study the statistical and computational properties of a network Lasso method for local graph clustering.
The clusters delivered by nLasso can be characterized elegantly via network flows between cluster boundary and seed nodes.
arXiv Detail & Related papers (2020-04-25T17:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.