Related papers: Average-Case Communication Complexity of Statistical Problems

Average-Case Communication Complexity of Statistical Problems

URL: http://arxiv.org/abs/2107.01335v1
Date: Sat, 3 Jul 2021 03:31:37 GMT
Title: Average-Case Communication Complexity of Statistical Problems
Authors: Cyrus Rashtchian, David P. Woodruff, Peng Ye, Hanlin Zhu
Abstract summary: Communication complexity is the main tool for proving lower bounds in streaming, sketching, and query-based models. We provide a general reduction method that preserves the input distribution for problems involving a random graph or matrix with planted structure. We derive two-party and multi-party communication lower bounds for detecting or finding planted cliques.
Score: 48.03761288397421
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study statistical problems, such as planted clique, its variants, and sparse principal component analysis in the context of average-case communication complexity. Our motivation is to understand the statistical-computational trade-offs in streaming, sketching, and query-based models. Communication complexity is the main tool for proving lower bounds in these models, yet many prior results do not hold in an average-case setting. We provide a general reduction method that preserves the input distribution for problems involving a random graph or matrix with planted structure. Then, we derive two-party and multi-party communication lower bounds for detecting or finding planted cliques, bipartite cliques, and related problems. As a consequence, we obtain new bounds on the query complexity in the edge-probe, vector-matrix-vector, matrix-vector, linear sketching, and $\mathbb{F}_2$-sketching models. Many of these results are nearly tight, and we use our techniques to provide simple proofs of some known lower bounds for the edge-probe model.

Related papers

Computational lower bounds in latent models: clustering, sparse-clustering, biclustering [32.472822302123234]
In many high-dimensional problems, like sparse-PCA, planted clustering, or clustering, the best known algorithms with complexity time fail to reach the statistical performance provably achievable by algorithms free of computational constraints.<n>This observation has given rise to the conjecture of the existence, for some problems, of gaps between the best possible statistical performance achievable without computational constraints, and the best performance achievable with poly-time algorithms.<n>A powerful approach to assess the best performance achievable in poly-time is to investigate the best performance achievable by cliques with low-degrees.
arXiv Detail & Related papers (2025-06-16T16:08:30Z)
Computational Complexity of Statistics: New Insights from Low-Degree Polynomials [57.377943721487966]
This is a survey on the use of low-degrees to predict and explain the apparent statistical-computational tradeoffs in a variety of average-case computational problems.
arXiv Detail & Related papers (2025-06-12T14:35:26Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach [27.301741710016223]
We consider inferential questions associated with the missing data version of panel data induced by staggered adoption. We develop and analyze a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage. We prove non-asymptotic and high-probability bounds on its error in estimating each missing entry.
arXiv Detail & Related papers (2024-01-24T18:58:18Z)
Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise. This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z)
Gaussian Graphical Model Selection for Huge Data via Minipatch Learning [1.2891210250935146]
We propose the Minipatch Graph (MPGraph) estimator to solve the problem of graphical model selection. MPGraph is a generalization of thresholded graph estimators fit to tiny, random subsets of both the observations and the nodes. We prove that our algorithm achieves finite-sample graph selection consistency.
arXiv Detail & Related papers (2021-10-22T21:06:48Z)
T-LoHo: A Bayesian Regularization Model for Structured Sparsity and Smoothness on Graphs [0.0]
In graph-structured data, structured sparsity and smoothness tend to cluster together. We propose a new prior for high dimensional parameters with graphical relations. We use it to detect structured sparsity and smoothness simultaneously.
arXiv Detail & Related papers (2021-07-06T10:10:03Z)
Analysis of Truncated Orthogonal Iteration for Sparse Eigenvector Problems [78.95866278697777]
We propose two variants of the Truncated Orthogonal Iteration to compute multiple leading eigenvectors with sparsity constraints simultaneously. We then apply our algorithms to solve the sparse principle component analysis problem for a wide range of test datasets.
arXiv Detail & Related papers (2021-03-24T23:11:32Z)
Computational Barriers to Estimation from Low-Degree Polynomials [81.67886161671379]
We study the power of low-degrees for the task of detecting the presence of hidden structures. For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree. As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems.
arXiv Detail & Related papers (2020-08-05T17:52:10Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Reducibility and Statistical-Computational Gaps from Secret Leakage [19.25775832101647]
We show that a slight generalization of the planted clique conjecture -- secret leakage planted clique -- gives rise to a variety of new average-case reduction techniques. Using variants of the planted clique conjecture for specific forms of secret leakage planted clique, we deduce tight statistical-computational tradeoffs.
arXiv Detail & Related papers (2020-05-16T20:56:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.