Average-Case Communication Complexity of Statistical Problems
- URL: http://arxiv.org/abs/2107.01335v1
- Date: Sat, 3 Jul 2021 03:31:37 GMT
- Title: Average-Case Communication Complexity of Statistical Problems
- Authors: Cyrus Rashtchian, David P. Woodruff, Peng Ye, Hanlin Zhu
- Abstract summary: Communication complexity is the main tool for proving lower bounds in streaming, sketching, and query-based models.
We provide a general reduction method that preserves the input distribution for problems involving a random graph or matrix with planted structure.
We derive two-party and multi-party communication lower bounds for detecting or finding planted cliques.
- Score: 48.03761288397421
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study statistical problems, such as planted clique, its variants, and
sparse principal component analysis in the context of average-case
communication complexity. Our motivation is to understand the
statistical-computational trade-offs in streaming, sketching, and query-based
models. Communication complexity is the main tool for proving lower bounds in
these models, yet many prior results do not hold in an average-case setting. We
provide a general reduction method that preserves the input distribution for
problems involving a random graph or matrix with planted structure. Then, we
derive two-party and multi-party communication lower bounds for detecting or
finding planted cliques, bipartite cliques, and related problems. As a
consequence, we obtain new bounds on the query complexity in the edge-probe,
vector-matrix-vector, matrix-vector, linear sketching, and
$\mathbb{F}_2$-sketching models. Many of these results are nearly tight, and we
use our techniques to provide simple proofs of some known lower bounds for the
edge-probe model.
Related papers
- Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach [27.301741710016223]
We consider inferential questions associated with the missing data version of panel data induced by staggered adoption.
We develop and analyze a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage.
We prove non-asymptotic and high-probability bounds on its error in estimating each missing entry.
arXiv Detail & Related papers (2024-01-24T18:58:18Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - Gaussian Graphical Model Selection for Huge Data via Minipatch Learning [1.2891210250935146]
We propose the Minipatch Graph (MPGraph) estimator to solve the problem of graphical model selection.
MPGraph is a generalization of thresholded graph estimators fit to tiny, random subsets of both the observations and the nodes.
We prove that our algorithm achieves finite-sample graph selection consistency.
arXiv Detail & Related papers (2021-10-22T21:06:48Z) - T-LoHo: A Bayesian Regularization Model for Structured Sparsity and
Smoothness on Graphs [0.0]
In graph-structured data, structured sparsity and smoothness tend to cluster together.
We propose a new prior for high dimensional parameters with graphical relations.
We use it to detect structured sparsity and smoothness simultaneously.
arXiv Detail & Related papers (2021-07-06T10:10:03Z) - Analysis of Truncated Orthogonal Iteration for Sparse Eigenvector
Problems [78.95866278697777]
We propose two variants of the Truncated Orthogonal Iteration to compute multiple leading eigenvectors with sparsity constraints simultaneously.
We then apply our algorithms to solve the sparse principle component analysis problem for a wide range of test datasets.
arXiv Detail & Related papers (2021-03-24T23:11:32Z) - Computational Barriers to Estimation from Low-Degree Polynomials [81.67886161671379]
We study the power of low-degrees for the task of detecting the presence of hidden structures.
For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree.
As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems.
arXiv Detail & Related papers (2020-08-05T17:52:10Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Reducibility and Statistical-Computational Gaps from Secret Leakage [19.25775832101647]
We show that a slight generalization of the planted clique conjecture -- secret leakage planted clique -- gives rise to a variety of new average-case reduction techniques.
Using variants of the planted clique conjecture for specific forms of secret leakage planted clique, we deduce tight statistical-computational tradeoffs.
arXiv Detail & Related papers (2020-05-16T20:56:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.