Average-Case Complexity of Tensor Decomposition for Low-Degree
Polynomials
- URL: http://arxiv.org/abs/2211.05274v2
- Date: Mon, 27 Mar 2023 01:04:10 GMT
- Title: Average-Case Complexity of Tensor Decomposition for Low-Degree
Polynomials
- Authors: Alexander S. Wein
- Abstract summary: "Statistical-computational gaps" occur in many statistical inference tasks.
We consider a model for random order-3 decomposition where one component is slightly larger in norm than the rest.
We show that tensor entries can accurately estimate the largest component when $ll n3/2$ but fail to do so when $rgg n3/2$.
- Score: 93.59919600451487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Suppose we are given an $n$-dimensional order-3 symmetric tensor $T \in
(\mathbb{R}^n)^{\otimes 3}$ that is the sum of $r$ random rank-1 terms. The
problem of recovering the rank-1 components is possible in principle when $r
\lesssim n^2$ but polynomial-time algorithms are only known in the regime $r
\ll n^{3/2}$. Similar "statistical-computational gaps" occur in many
high-dimensional inference tasks, and in recent years there has been a flurry
of work on explaining the apparent computational hardness in these problems by
proving lower bounds against restricted (yet powerful) models of computation
such as statistical queries (SQ), sum-of-squares (SoS), and low-degree
polynomials (LDP). However, no such prior work exists for tensor decomposition,
largely because its hardness does not appear to be explained by a "planted
versus null" testing problem.
We consider a model for random order-3 tensor decomposition where one
component is slightly larger in norm than the rest (to break symmetry), and the
components are drawn uniformly from the hypercube. We resolve the computational
complexity in the LDP model: $O(\log n)$-degree polynomial functions of the
tensor entries can accurately estimate the largest component when $r \ll
n^{3/2}$ but fail to do so when $r \gg n^{3/2}$. This provides rigorous
evidence suggesting that the best known algorithms for tensor decomposition
cannot be improved, at least by known approaches. A natural extension of the
result holds for tensors of any fixed order $k \ge 3$, in which case the LDP
threshold is $r \sim n^{k/2}$.
Related papers
- Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Identification of Mixtures of Discrete Product Distributions in
Near-Optimal Sample and Time Complexity [6.812247730094931]
We show, for any $ngeq 2k-1$, how to achieve sample complexity and run-time complexity $(1/zeta)O(k)$.
We also extend the known lower bound of $eOmega(k)$ to match our upper bound across a broad range of $zeta$.
arXiv Detail & Related papers (2023-09-25T09:50:15Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor
Decompositions [51.19236668224547]
We study low rank approximation of tensors, focusing on the tensor train and Tucker decompositions.
For tensor train decomposition, we give a bicriteria $(1 + eps)$-approximation algorithm with a small bicriteria rank and $O(q cdot nnz(A))$ running time.
In addition, we extend our algorithm to tensor networks with arbitrary graphs.
arXiv Detail & Related papers (2022-07-15T11:55:09Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - The Complexity of Sparse Tensor PCA [1.90365714903665]
For any $1 leq leq k$, our algorithms recover the sparse vector for signal-to-noise ratio $lambda geq tildemathcalO (sqrtt cdot (k/t)p/2)$ in time.
Even in the restricted case of PCA, known algorithms only recover the sparse vectors for $lambda geq tildemathcalO(k cdot r) while our algorithms require $lambda ge
arXiv Detail & Related papers (2021-06-11T10:57:00Z) - On $O( \max \{n_1, n_2 \}\log ( \max \{ n_1, n_2 \} n_3) )$ Sample
Entries for $n_1 \times n_2 \times n_3$ Tensor Completion via Unitary
Transformation [20.854908850239035]
This paper investigates the incoherence conditions of $n_3$ low-rank $n_3$ low-rank $n_3$ matrix slices.
We show that such low-rank tensors can be recovered exactly with high probability.
arXiv Detail & Related papers (2020-12-16T08:03:48Z) - Exact nuclear norm, completion and decomposition for random overcomplete
tensors via degree-4 SOS [0.7233897166339269]
We show that simple semidefinite programs inspired by degree $4$ SOS can exactly solve the tensor nuclear norm, tensor decomposition, and tensor completion problems on tensors with random asymmetric components.
We show that w.h.p. these semidefinite programs can exactly find the nuclear norm and components of an $(ntimes ntimes n)$-tensor $mathcalT$ with $mleq n3/2/polylog(n)$ random asymmetric components.
arXiv Detail & Related papers (2020-11-18T17:27:36Z) - Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
We show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
arXiv Detail & Related papers (2020-10-22T00:32:12Z) - Statistical Query Lower Bounds for Tensor PCA [10.701091387118023]
In the PCA problem introduced by Richard and Montanari, one is given a dataset consisting of $n$ samples $mathbbEmathbfT_1:n$ of i.i.d. Perkins Gaussian tensors of order $k$ with the promise that $mathbbEmathbfT_1:n$ is a rank-1.
The goal is to estimate $mathbbE mathbfT_1$, but no time estimator is known.
We provide a sharp analysis of the optimal sample complexity
arXiv Detail & Related papers (2020-08-10T13:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.