Spectral Clustering on Large Datasets: When Does it Work? Theory from
Continuous Clustering and Density Cheeger-Buser
- URL: http://arxiv.org/abs/2305.06541v1
- Date: Thu, 11 May 2023 03:08:49 GMT
- Title: Spectral Clustering on Large Datasets: When Does it Work? Theory from
Continuous Clustering and Density Cheeger-Buser
- Authors: Timothy Chu, Gary Miller, Noel Walkington
- Abstract summary: In modern machine learning, many data sets are modeled as a large number of points drawn from a probability density function.
Our work suggests that Shi-Malik spectral clustering works well on data drawn from mixtures of Laplace distributions.
Our key tool is a new Cheeger-Buser inequality for all probability densities, including discontinuous ones.
- Score: 0.17188280334580194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spectral clustering is one of the most popular clustering algorithms that has
stood the test of time. It is simple to describe, can be implemented using
standard linear algebra, and often finds better clusters than traditional
clustering algorithms like $k$-means and $k$-centers. The foundational
algorithm for two-way spectral clustering, by Shi and Malik, creates a
geometric graph from data and finds a spectral cut of the graph.
In modern machine learning, many data sets are modeled as a large number of
points drawn from a probability density function. Little is known about when
spectral clustering works in this setting -- and when it doesn't. Past
researchers justified spectral clustering by appealing to the graph Cheeger
inequality (which states that the spectral cut of a graph approximates the
``Normalized Cut''), but this justification is known to break down on large
data sets.
We provide theoretically-informed intuition about spectral clustering on
large data sets drawn from probability densities, by proving when a continuous
form of spectral clustering considered by past researchers (the unweighted
spectral cut of a probability density) finds good clusters of the underlying
density itself. Our work suggests that Shi-Malik spectral clustering works well
on data drawn from mixtures of Laplace distributions, and works poorly on data
drawn from certain other densities, such as a density we call the `square-root
trough'.
Our core theorem proves that weighted spectral cuts have low weighted
isoperimetry for all probability densities. Our key tool is a new Cheeger-Buser
inequality for all probability densities, including discontinuous ones.
Related papers
- Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation.
The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation.
In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space.
A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z) - Robustness for Spectral Clustering of General Graphs under Local Differential Privacy [2.4447259440166302]
We argue that social networks do not originate from the block model (SBM)
Our primary focus is the edge flipping method -- a common technique for protecting local differential privacy.
Although clustering outcomes have been stable for dense and well-clustered graphs produced from the SBM, we show that in general, spectral clustering may yield highly erratic results on certain dense and well-clustered graphs.
arXiv Detail & Related papers (2023-09-13T10:23:29Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Improving Spectral Clustering Using Spectrum-Preserving Node Reduction [1.52292571922932]
We use spectrum-preserving node reduction to accelerate eigen-decomposition and generate concise representations of data sets.
The experimental results show dramatically improved clustering performance when compared with state-of-the-art methods.
arXiv Detail & Related papers (2021-10-24T01:43:12Z) - Higher-Order Spectral Clustering for Geometric Graphs [0.17188280334580194]
We present an effective generalization, which we call higher-order spectral clustering.
It resembles in concept the classical spectral clustering method but uses for partitioning the eigenvector associated with a higher-order eigenvalue.
We show that our method is effective in numerical experiments even for graphs of modest size.
arXiv Detail & Related papers (2020-09-23T19:51:55Z) - Spectral Clustering with Smooth Tiny Clusters [14.483043753721256]
We propose a novel clustering algorithm, which con-siders the smoothness of data for the first time.
Our key idea is to cluster tiny clusters, whose centers constitute smooth graphs.
Although in this paper, we singly focus on multi-scale situations, the idea of data smoothness can certainly be extended to any clustering algorithms.
arXiv Detail & Related papers (2020-09-10T05:21:20Z) - Average Sensitivity of Spectral Clustering [31.283432482502278]
We study the stability of spectral clustering against edge perturbations in the input graph.
Our results suggest that spectral clustering is stable against edge perturbations when there is a cluster structure in the input graph.
arXiv Detail & Related papers (2020-06-07T09:14:44Z) - Local Graph Clustering with Network Lasso [90.66817876491052]
We study the statistical and computational properties of a network Lasso method for local graph clustering.
The clusters delivered by nLasso can be characterized elegantly via network flows between cluster boundary and seed nodes.
arXiv Detail & Related papers (2020-04-25T17:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.