Related papers: A Sublinear-Time Spectral Clustering Oracle with Improved Preprocessing Time

A Sublinear-Time Spectral Clustering Oracle with Improved Preprocessing Time

URL: http://arxiv.org/abs/2310.17878v2
Date: Fri, 29 Dec 2023 08:32:43 GMT
Title: A Sublinear-Time Spectral Clustering Oracle with Improved Preprocessing Time
Authors: Ranran Shen, Pan Peng
Abstract summary: We address the problem of designing a sublinear-time spectral clustering oracle for graphs that exhibit strong clusterability. Our algorithm relaxes assumptions, albeit at the cost of a slightly higher misclassification ratio. Our clustering oracle is robust against a few random edge deletions.
Score: 6.961946145048321
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address the problem of designing a sublinear-time spectral clustering oracle for graphs that exhibit strong clusterability. Such graphs contain $k$ latent clusters, each characterized by a large inner conductance (at least $\varphi$) and a small outer conductance (at most $\varepsilon$). Our aim is to preprocess the graph to enable clustering membership queries, with the key requirement that both preprocessing and query answering should be performed in sublinear time, and the resulting partition should be consistent with a $k$-partition that is close to the ground-truth clustering. Previous oracles have relied on either a $\textrm{poly}(k)\log n$ gap between inner and outer conductances or exponential (in $k/\varepsilon$) preprocessing time. Our algorithm relaxes these assumptions, albeit at the cost of a slightly higher misclassification ratio. We also show that our clustering oracle is robust against a few random edge deletions. To validate our theoretical bounds, we conducted experiments on synthetic networks.

Related papers

Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.<n>We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence between the transition kernels of the chains.<n>We then present a novel two-stage clustering algorithm.
arXiv Detail & Related papers (2025-06-02T05:10:40Z)
Generalization Performance of Ensemble Clustering: From Theory to Algorithm [57.176040163699554]
This paper focuses on generalization error, excess risk and consistency in ensemble clustering.<n>By assigning varying weights to finite clusterings, we minimize the error between the empirical average clusterings and their expectation.<n>We instantiate our theory to develop a new ensemble clustering algorithm.
arXiv Detail & Related papers (2025-06-01T09:34:52Z)
IsoSEL: Isometric Structural Entropy Learning for Deep Graph Clustering in Hyperbolic Space [57.036143666293334]
Graph clustering is a longstanding topic in machine learning. In this paper, we study a challenging yet practical problem: deep graph clustering without K considering the imbalance in reality. We present a novel IsoSEL framework for deep graph clustering, where we design a hyperbolic neural network to learn partitioning tree in the Lorentz model of hyperbolic space.
arXiv Detail & Related papers (2025-04-14T08:21:41Z)
Fully Dynamic Adversarially Robust Correlation Clustering in Polylogarithmic Update Time [19.25942907402098]
We study the dynamic correlation clustering problem with $textitadaptive$ edge label flips. In correlation clustering, we are given a $n$-vertex complete graph whose edges are labeled either $(+)$ or $(-)$. We consider the dynamic setting with adversarial robustness, in which the $textitadaptive$ adversary could flip the label of an edge based on the current output of the algorithm.
arXiv Detail & Related papers (2024-11-15T06:26:37Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Approximating a RUM from Distributions on k-Slates [88.32814292632675]
We find a generalization-time algorithm that finds the RUM that best approximates the given distribution on average. Our theoretical result can also be made practical: we obtain a that is effective and scales to real-world datasets.
arXiv Detail & Related papers (2023-05-22T17:43:34Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning. Recent work has established lower bounds against the class of low-degree methods. We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z)
Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle [7.449644976563424]
We propose an elegant theoretical model for studying clustering with a faulty oracle. It was left as an open question whether one can obtain a query-optimal, time-efficient algorithm for the general case of $k$ clusters. We provide a time-efficient algorithm with nearly-optimal query complexity (up to a factor of $O(log2 n)$) for all constant $k$ and any $delta$ in the regime when information-theoretic recovery is possible.
arXiv Detail & Related papers (2021-06-18T22:20:12Z)
On Margin-Based Cluster Recovery with Oracle Queries [22.672233769934845]
We study an active cluster recovery problem where, given a set of $n$ points oracle and an answering queries like "are these two points in the same cluster?" We give an algorithm that recovers arbitrary convex clusters in exactly time, and with a number of queries that is lower than the best existing algorithm by $Theta(mm)$ factors. For general pseudometric spaces, where clusters might not be convex or might not have any notion of shape, we give an algorithm that achieves the $O(log n)$ query bound, and is provably near optimal.
arXiv Detail & Related papers (2021-06-09T08:48:23Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem. We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z)
Exact Recovery of Mangled Clusters with Same-Cluster Queries [20.03712152278538]
We study the cluster recovery problem in the semi-supervised active clustering framework. We design an algorithm that, given $n$ points to be partitioned into $k$ clusters, uses $O(k3 ln k ln n)$ oracle queries and $tildeO(kn + k3)$ time to recover the clustering with zero misclassification error.
arXiv Detail & Related papers (2020-06-08T15:27:58Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
Query-Efficient Correlation Clustering [13.085439249887713]
Correlation clustering is arguably the most natural formulation of clustering. A main drawback of correlation clustering is that it requires as input the $Theta(n2)$ pairwise similarities. We devise a correlation clustering algorithm that attains a solution whose expected number of disagreements is at most $3cdot OPT + O(fracn3Q)$.
arXiv Detail & Related papers (2020-02-26T15:18:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.