Scalable Spectral Clustering with Nystrom Approximation: Practical and
Theoretical Aspects
- URL: http://arxiv.org/abs/2006.14470v2
- Date: Sun, 31 Jan 2021 02:10:30 GMT
- Title: Scalable Spectral Clustering with Nystrom Approximation: Practical and
Theoretical Aspects
- Authors: Farhad Pourkamali-Anaraki
- Abstract summary: This work presents a principled spectral clustering algorithm that exploits spectral properties of the similarity matrix associated with sampled points to regulate accuracy-efficiency trade-offs.
The overarching goal of this work is to provide an improved baseline for future research directions to accelerate spectral clustering.
- Score: 1.6752182911522515
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spectral clustering techniques are valuable tools in signal processing and
machine learning for partitioning complex data sets. The effectiveness of
spectral clustering stems from constructing a non-linear embedding based on
creating a similarity graph and computing the spectral decomposition of the
Laplacian matrix. However, spectral clustering methods fail to scale to large
data sets because of high computational cost and memory usage. A popular
approach for addressing these problems utilizes the Nystrom method, an
efficient sampling-based algorithm for computing low-rank approximations to
large positive semi-definite matrices. This paper demonstrates how the
previously popular approach of Nystrom-based spectral clustering has severe
limitations. Existing time-efficient methods ignore critical information by
prematurely reducing the rank of the similarity matrix associated with sampled
points. Also, current understanding is limited regarding how utilizing the
Nystrom approximation will affect the quality of spectral embedding
approximations. To address the limitations, this work presents a principled
spectral clustering algorithm that exploits spectral properties of the
similarity matrix associated with sampled points to regulate
accuracy-efficiency trade-offs. We provide theoretical results to reduce the
current gap and present numerical experiments with real and synthetic data.
Empirical results demonstrate the efficacy and efficiency of the proposed
method compared to existing spectral clustering techniques based on the Nystrom
method and other efficient methods. The overarching goal of this work is to
provide an improved baseline for future research directions to accelerate
spectral clustering.
Related papers
- Toward Efficient and Incremental Spectral Clustering via Parametric
Spectral Clustering [2.44755919161855]
Spectral clustering is a popular method for effectively clustering nonlinearly separable data.
This paper introduces a novel approach called parametric spectral clustering (PSC)
PSC addresses the challenges associated with big data and real-time scenarios.
arXiv Detail & Related papers (2023-11-14T01:26:20Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Optimal Algorithms for the Inhomogeneous Spiked Wigner Model [89.1371983413931]
We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem.
We identify in particular the existence of a statistical-to-computational gap where known algorithms require a signal-to-noise ratio bigger than the information-theoretic threshold to perform better than random.
arXiv Detail & Related papers (2023-02-13T19:57:17Z) - Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
We propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy.
A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings.
An experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
arXiv Detail & Related papers (2022-08-19T19:01:30Z) - Divide-and-conquer based Large-Scale Spectral Clustering [8.545202841051582]
We propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness.
The proposed method achieves lower computational complexity than most existing large-scale spectral clustering.
arXiv Detail & Related papers (2021-04-30T15:09:45Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Spectral Methods for Data Science: A Statistical Perspective [37.2486912080998]
Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data.
This book aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective.
arXiv Detail & Related papers (2020-12-15T18:40:56Z) - Multi-View Spectral Clustering with High-Order Optimal Neighborhood
Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data.
This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix.
Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z) - A boosted outlier detection method based on the spectrum of the
Laplacian matrix of a graph [0.0]
This paper explores a new outlier detection algorithm based on the spectrum of the Laplacian matrix of a graph.
The sparcity of the Laplacian matrix significantly decreases the computational burden.
arXiv Detail & Related papers (2020-08-07T08:37:56Z) - Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling [19.675277307158435]
This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a on when to use it.
Our datasets depends on the eigen spectrum shape of the dataset, and yields competitive low-rank approximations in test compared to the other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-21T17:49:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.