Statistical Query Lower Bounds for Tensor PCA
- URL: http://arxiv.org/abs/2008.04101v2
- Date: Sat, 13 Feb 2021 19:00:15 GMT
- Title: Statistical Query Lower Bounds for Tensor PCA
- Authors: Rishabh Dudeja and Daniel Hsu
- Abstract summary: In the PCA problem introduced by Richard and Montanari, one is given a dataset consisting of $n$ samples $mathbbEmathbfT_1:n$ of i.i.d. Perkins Gaussian tensors of order $k$ with the promise that $mathbbEmathbfT_1:n$ is a rank-1.
The goal is to estimate $mathbbE mathbfT_1$, but no time estimator is known.
We provide a sharp analysis of the optimal sample complexity
- Score: 10.701091387118023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the Tensor PCA problem introduced by Richard and Montanari (2014), one is
given a dataset consisting of $n$ samples $\mathbf{T}_{1:n}$ of i.i.d. Gaussian
tensors of order $k$ with the promise that $\mathbb{E}\mathbf{T}_1$ is a rank-1
tensor and $\|\mathbb{E} \mathbf{T}_1\| = 1$. The goal is to estimate
$\mathbb{E} \mathbf{T}_1$. This problem exhibits a large conjectured hard phase
when $k>2$: When $d \lesssim n \ll d^{\frac{k}{2}}$ it is information
theoretically possible to estimate $\mathbb{E} \mathbf{T}_1$, but no polynomial
time estimator is known. We provide a sharp analysis of the optimal sample
complexity in the Statistical Query (SQ) model and show that SQ algorithms with
polynomial query complexity not only fail to solve Tensor PCA in the
conjectured hard phase, but also have a strictly sub-optimal sample complexity
compared to some polynomial time estimators such as the Richard-Montanari
spectral estimator. Our analysis reveals that the optimal sample complexity in
the SQ model depends on whether $\mathbb{E} \mathbf{T}_1$ is symmetric or not.
For symmetric, even order tensors, we also isolate a sample size regime in
which it is possible to test if $\mathbb{E} \mathbf{T}_1 = \mathbf{0}$ or
$\mathbb{E}\mathbf{T}_1 \neq \mathbf{0}$ with polynomially many queries but not
estimate $\mathbb{E}\mathbf{T}_1$. Our proofs rely on the Fourier analytic
approach of Feldman, Perkins and Vempala (2018) to prove sharp SQ lower bounds.
Related papers
- Near-Optimal and Tractable Estimation under Shift-Invariance [0.21756081703275998]
Class of all such signals is but extremely rich: it contains all exponential oscillations over $mathbbCn$ with total degree $s$.
We show that the statistical complexity of this class, as measured by the radius squared minimax frequencies of the $(delta)$-confidence $ell$-ball, is nearly the same as for the class of $s$-sparse signals, namely $Oleft(slog(en) + log(delta-1)right) cdot log(en/s)
arXiv Detail & Related papers (2024-11-05T18:11:23Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Statistical Query Lower Bounds for Learning Truncated Gaussians [43.452452030671694]
We show that the complexity of any SQ algorithm for this problem is $dmathrmpoly (1/epsilon)$, even when the class $mathcalC$ is simple so that $mathrmpoly(d/epsilon) samples information-theoretically suffice.
arXiv Detail & Related papers (2024-03-04T18:30:33Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Testing Closeness of Multivariate Distributions via Ramsey Theory [40.926523210945064]
We investigate the statistical task of closeness (or equivalence) testing for multidimensional distributions.
Specifically, given sample access to two unknown distributions $mathbf p, mathbf q$ on $mathbb Rd$, we want to distinguish between the case that $mathbf p=mathbf q$ versus $|mathbf p-mathbf q|_A_k > epsilon$.
Our main result is the first closeness tester for this problem with em sub-learning sample complexity in any fixed dimension.
arXiv Detail & Related papers (2023-11-22T04:34:09Z) - Average-Case Complexity of Tensor Decomposition for Low-Degree
Polynomials [93.59919600451487]
"Statistical-computational gaps" occur in many statistical inference tasks.
We consider a model for random order-3 decomposition where one component is slightly larger in norm than the rest.
We show that tensor entries can accurately estimate the largest component when $ll n3/2$ but fail to do so when $rgg n3/2$.
arXiv Detail & Related papers (2022-11-10T00:40:37Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Multimeasurement Generative Models [7.502947376736449]
We map the problem of sampling from an unknown distribution with density $p_X$ in $mathbbRd$ to the problem of learning and sampling $p_mathbfY$ in $mathbbRMd$ obtained by convolving $p_X$ with a fixed factorial kernel.
arXiv Detail & Related papers (2021-12-18T02:11:36Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.