Lattice-Based Methods Surpass Sum-of-Squares in Clustering
- URL: http://arxiv.org/abs/2112.03898v1
- Date: Tue, 7 Dec 2021 18:50:17 GMT
- Title: Lattice-Based Methods Surpass Sum-of-Squares in Clustering
- Authors: Ilias Zadik, Min Jae Song, Alexander S. Wein, Joan Bruna
- Abstract summary: Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
- Score: 98.46302040220395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering is a fundamental primitive in unsupervised learning which gives
rise to a rich class of computationally-challenging inference tasks. In this
work, we focus on the canonical task of clustering $d$-dimensional Gaussian
mixtures with unknown (and possibly degenerate) covariance. Recent works (Ghosh
et al.\ '20; Mao, Wein '21; Davis, Diaz, Wang '21) have established lower
bounds against the class of low-degree polynomial methods and the
sum-of-squares (SoS) hierarchy for recovering certain hidden structures planted
in Gaussian clustering instances. Prior work on many similar inference tasks
portends that such lower bounds strongly suggest the presence of an inherent
statistical-to-computational gap for clustering, that is, a parameter regime
where the clustering task is \textit{statistically} possible but no
\textit{polynomial-time} algorithm succeeds.
One special case of the clustering task we consider is equivalent to the
problem of finding a planted hypercube vector in an otherwise random subspace.
We show that, perhaps surprisingly, this particular clustering model
\textit{does not exhibit} a statistical-to-computational gap, even though the
aforementioned low-degree and SoS lower bounds continue to apply in this case.
To achieve this, we give a polynomial-time algorithm based on the
Lenstra--Lenstra--Lovasz lattice basis reduction method which achieves the
statistically-optimal sample complexity of $d+1$ samples. This result extends
the class of problems whose conjectured statistical-to-computational gaps can
be "closed" by "brittle" polynomial-time algorithms, highlighting the crucial
but subtle role of noise in the onset of statistical-to-computational gaps.
Related papers
- Heteroskedastic Tensor Clustering [20.979358557219953]
We propose a two-stage method, named $mathsfHightext-orderHeteroClustering$ ($mathsfHHC$)
It starts by performing tensor subspace estimation via a novel spectral algorithm called $mathsfThresholdedDeflatedtext-HeteroPCA$, followed by approximate $k$-means to obtain cluster nodes.
Our algorithm provably achieves exact clustering as long as the SNR exceeds the computational limit.
arXiv Detail & Related papers (2023-11-04T02:50:40Z) - Gap-Free Clustering: Sensitivity and Robustness of SDP [6.996002801232415]
We study graph clustering in the Block Model (SBM) in the presence of both large clusters and small, unrecoverable clusters.
Previous convex relaxation approaches achieving exact recovery do not allow any small clusters of size $o(sqrtn)$, or require a size gap between the smallest recovered cluster and the largest non-recovered cluster.
We provide an algorithm based on semidefinite programming (SDP) which removes these requirements and provably recovers large clusters regardless of the remaining cluster sizes.
arXiv Detail & Related papers (2023-08-29T21:27:21Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Simple and Scalable Sparse k-means Clustering via Feature Ranking [14.839931533868176]
We propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms.
Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings.
arXiv Detail & Related papers (2020-02-20T02:41:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.