Related papers: Scalable Spectral Clustering with Nystrom Approximation: Practical and Theoretical Aspects

Scalable Spectral Clustering with Nystrom Approximation: Practical and Theoretical Aspects

URL: http://arxiv.org/abs/2006.14470v2
Date: Sun, 31 Jan 2021 02:10:30 GMT
Title: Scalable Spectral Clustering with Nystrom Approximation: Practical and Theoretical Aspects
Authors: Farhad Pourkamali-Anaraki
Abstract summary: This work presents a principled spectral clustering algorithm that exploits spectral properties of the similarity matrix associated with sampled points to regulate accuracy-efficiency trade-offs. The overarching goal of this work is to provide an improved baseline for future research directions to accelerate spectral clustering.
Score: 1.6752182911522515
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Spectral clustering techniques are valuable tools in signal processing and machine learning for partitioning complex data sets. The effectiveness of spectral clustering stems from constructing a non-linear embedding based on creating a similarity graph and computing the spectral decomposition of the Laplacian matrix. However, spectral clustering methods fail to scale to large data sets because of high computational cost and memory usage. A popular approach for addressing these problems utilizes the Nystrom method, an efficient sampling-based algorithm for computing low-rank approximations to large positive semi-definite matrices. This paper demonstrates how the previously popular approach of Nystrom-based spectral clustering has severe limitations. Existing time-efficient methods ignore critical information by prematurely reducing the rank of the similarity matrix associated with sampled points. Also, current understanding is limited regarding how utilizing the Nystrom approximation will affect the quality of spectral embedding approximations. To address the limitations, this work presents a principled spectral clustering algorithm that exploits spectral properties of the similarity matrix associated with sampled points to regulate accuracy-efficiency trade-offs. We provide theoretical results to reduce the current gap and present numerical experiments with real and synthetic data. Empirical results demonstrate the efficacy and efficiency of the proposed method compared to existing spectral clustering techniques based on the Nystrom method and other efficient methods. The overarching goal of this work is to provide an improved baseline for future research directions to accelerate spectral clustering.

Related papers

Toward Efficient and Incremental Spectral Clustering via Parametric Spectral Clustering [2.44755919161855]
Spectral clustering is a popular method for effectively clustering nonlinearly separable data. This paper introduces a novel approach called parametric spectral clustering (PSC) PSC addresses the challenges associated with big data and real-time scenarios.
arXiv Detail & Related papers (2023-11-14T01:26:20Z)
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Optimal Algorithms for the Inhomogeneous Spiked Wigner Model [89.1371983413931]
We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem. We identify in particular the existence of a statistical-to-computational gap where known algorithms require a signal-to-noise ratio bigger than the information-theoretic threshold to perform better than random.
arXiv Detail & Related papers (2023-02-13T19:57:17Z)
Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
We propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy. A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings. An experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
arXiv Detail & Related papers (2022-08-19T19:01:30Z)
Divide-and-conquer based Large-Scale Spectral Clustering [8.545202841051582]
We propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness. The proposed method achieves lower computational complexity than most existing large-scale spectral clustering.
arXiv Detail & Related papers (2021-04-30T15:09:45Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
Spectral Methods for Data Science: A Statistical Perspective [37.2486912080998]
Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. This book aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective.
arXiv Detail & Related papers (2020-12-15T18:40:56Z)
Multi-View Spectral Clustering with High-Order Optimal Neighborhood Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data. This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix. Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z)
A boosted outlier detection method based on the spectrum of the Laplacian matrix of a graph [0.0]
This paper explores a new outlier detection algorithm based on the spectrum of the Laplacian matrix of a graph. The sparcity of the Laplacian matrix significantly decreases the computational burden.
arXiv Detail & Related papers (2020-08-07T08:37:56Z)
Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling [19.675277307158435]
This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a on when to use it. Our datasets depends on the eigen spectrum shape of the dataset, and yields competitive low-rank approximations in test compared to the other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-21T17:49:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.