Related papers: Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model

Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model

URL: http://arxiv.org/abs/2306.12968v1
Date: Sun, 18 Jun 2023 08:46:06 GMT
Title: Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model
Authors: Kaito Ariu, Alexandre Proutiere, Se-Young Yun
Abstract summary: We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
Score: 79.46465138631592
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of recovering hidden communities in the Labeled Stochastic Block Model (LSBM) with a finite number of clusters, where cluster sizes grow linearly with the total number $n$ of items. In the LSBM, a label is (independently) observed for each pair of items. Our objective is to devise an efficient algorithm that recovers clusters using the observed labels. To this end, we revisit instance-specific lower bounds on the expected number of misclassified items satisfied by any clustering algorithm. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability. IAC consists of a one-time spectral clustering algorithm followed by an iterative likelihood-based cluster assignment improvement. This approach is based on the instance-specific lower bound and does not require any model parameters, including the number of clusters. By performing the spectral clustering only once, IAC maintains an overall computational complexity of $\mathcal{O}(n \text{polylog}(n))$. We illustrate the effectiveness of our approach through numerical experiments.

Related papers

Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.<n>We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence between the transition kernels of the chains.<n>We then present a novel two-stage clustering algorithm.
arXiv Detail & Related papers (2025-06-02T05:10:40Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Gap-Free Clustering: Sensitivity and Robustness of SDP [6.996002801232415]
We study graph clustering in the Block Model (SBM) in the presence of both large clusters and small, unrecoverable clusters. Previous convex relaxation approaches achieving exact recovery do not allow any small clusters of size $o(sqrtn)$, or require a size gap between the smallest recovered cluster and the largest non-recovered cluster. We provide an algorithm based on semidefinite programming (SDP) which removes these requirements and provably recovers large clusters regardless of the remaining cluster sizes.
arXiv Detail & Related papers (2023-08-29T21:27:21Z)
A Computational Theory and Semi-Supervised Algorithm for Clustering [0.0]
A semi-supervised clustering algorithm is presented. The kernel of the clustering method is Mohammad's anomaly detection algorithm. Results are presented on synthetic and realworld data sets.
arXiv Detail & Related papers (2023-06-12T09:15:58Z)
Convex Clustering through MM: An Efficient Algorithm to Perform Hierarchical Clustering [1.0589208420411012]
We propose convex clustering through majorization-minimization ( CCMM) -- an iterative algorithm that uses cluster fusions and a highly efficient updating scheme. With a current desktop computer, CCMM efficiently solves convex clustering problems featuring over one million objects in seven-dimensional space.
arXiv Detail & Related papers (2022-11-03T15:07:51Z)
Recovering Unbalanced Communities in the Stochastic Block Model With Application to Clustering with a Faulty Oracle [9.578056676899203]
oracle block model (SBM) is a fundamental model for studying graph clustering or community detection in networks. We provide a simple SVD-based algorithm for recovering the communities in the SBM with communities of varying sizes.
arXiv Detail & Related papers (2022-02-17T08:51:19Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Personalized Federated Learning via Convex Clustering [72.15857783681658]
We propose a family of algorithms for personalized federated learning with locally convex user costs. The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized.
arXiv Detail & Related papers (2022-02-01T19:25:31Z)
Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning. Recent work has established lower bounds against the class of low-degree methods. We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z)
Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC) In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning. For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z)
On Margin-Based Cluster Recovery with Oracle Queries [22.672233769934845]
We study an active cluster recovery problem where, given a set of $n$ points oracle and an answering queries like "are these two points in the same cluster?" We give an algorithm that recovers arbitrary convex clusters in exactly time, and with a number of queries that is lower than the best existing algorithm by $Theta(mm)$ factors. For general pseudometric spaces, where clusters might not be convex or might not have any notion of shape, we give an algorithm that achieves the $O(log n)$ query bound, and is provably near optimal.
arXiv Detail & Related papers (2021-06-09T08:48:23Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Exact Recovery of Mangled Clusters with Same-Cluster Queries [20.03712152278538]
We study the cluster recovery problem in the semi-supervised active clustering framework. We design an algorithm that, given $n$ points to be partitioned into $k$ clusters, uses $O(k3 ln k ln n)$ oracle queries and $tildeO(kn + k3)$ time to recover the clustering with zero misclassification error.
arXiv Detail & Related papers (2020-06-08T15:27:58Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
Bi-objective Optimization of Biclustering with Binary Data [0.0]
Clustering consists of partitioning data objects into subsets called clusters according to some similarity criteria. This paper addresses a quasi-clustering that allows overlapping of clusters, and which we link to biclustering. Biclustering simultaneously groups the objects and features so that a specific group of objects has a special group of features.
arXiv Detail & Related papers (2020-02-09T21:49:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.