Computational lower bounds in latent models: clustering, sparse-clustering, biclustering
- URL: http://arxiv.org/abs/2506.13647v1
- Date: Mon, 16 Jun 2025 16:08:30 GMT
- Title: Computational lower bounds in latent models: clustering, sparse-clustering, biclustering
- Authors: Bertrand Even, Christophe Giraud, Nicolas Verzelen,
- Abstract summary: In many high-dimensional problems, like sparse-PCA, planted clustering, or clustering, the best known algorithms with complexity time fail to reach the statistical performance provably achievable by algorithms free of computational constraints.<n>This observation has given rise to the conjecture of the existence, for some problems, of gaps between the best possible statistical performance achievable without computational constraints, and the best performance achievable with poly-time algorithms.<n>A powerful approach to assess the best performance achievable in poly-time is to investigate the best performance achievable by cliques with low-degrees.
- Score: 32.472822302123234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many high-dimensional problems, like sparse-PCA, planted clique, or clustering, the best known algorithms with polynomial time complexity fail to reach the statistical performance provably achievable by algorithms free of computational constraints. This observation has given rise to the conjecture of the existence, for some problems, of gaps -- so called statistical-computational gaps -- between the best possible statistical performance achievable without computational constraints, and the best performance achievable with poly-time algorithms. A powerful approach to assess the best performance achievable in poly-time is to investigate the best performance achievable by polynomials with low-degree. We build on the seminal paper of Schramm and Wein (2022) and propose a new scheme to derive lower bounds on the performance of low-degree polynomials in some latent space models. By better leveraging the latent structures, we obtain new and sharper results, with simplified proofs. We then instantiate our scheme to provide computational lower bounds for the problems of clustering, sparse clustering, and biclustering. We also prove matching upper-bounds and some additional statistical results, in order to provide a comprehensive description of the statistical-computational gaps occurring in these three problems.
Related papers
- Computational Complexity of Statistics: New Insights from Low-Degree Polynomials [57.377943721487966]
This is a survey on the use of low-degrees to predict and explain the apparent statistical-computational tradeoffs in a variety of average-case computational problems.
arXiv Detail & Related papers (2025-06-12T14:35:26Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering [0.5801044612920815]
We present a new branch-and-bound algorithm for semi-supervised MSSC.
Background knowledge is incorporated as pairwise must-link and cannot-link constraints.
For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points.
arXiv Detail & Related papers (2021-11-30T17:08:53Z) - Sparse PCA: Algorithms, Adversarial Perturbations and Certificates [9.348107805982604]
We study efficient algorithms for Sparse PCA in standard statistical models.
Our goal is to achieve optimal recovery guarantees while being resilient to small perturbations.
arXiv Detail & Related papers (2020-11-12T18:58:51Z) - Statistical Query Algorithms and Low-Degree Tests Are Almost Equivalent [29.684442397855197]
We study two of the most popular restricted computational models, the statistical query framework and low-degree corollas, in the context of high-dimensional hypothesis testing.
Under mild conditions on the testing problem, the two classes of algorithms are essentially equivalent in power.
Asries, we obtain new statistical query lower bounds for sparse PCA, tensor PCA and several variants of the planted clique problem.
arXiv Detail & Related papers (2020-09-13T22:55:18Z) - Computational Barriers to Estimation from Low-Degree Polynomials [81.67886161671379]
We study the power of low-degrees for the task of detecting the presence of hidden structures.
For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree.
As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems.
arXiv Detail & Related papers (2020-08-05T17:52:10Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.