The Complexity of Sparse Tensor PCA
- URL: http://arxiv.org/abs/2106.06308v1
- Date: Fri, 11 Jun 2021 10:57:00 GMT
- Title: The Complexity of Sparse Tensor PCA
- Authors: Davin Choo, Tommaso d'Orsi
- Abstract summary: For any $1 leq leq k$, our algorithms recover the sparse vector for signal-to-noise ratio $lambda geq tildemathcalO (sqrtt cdot (k/t)p/2)$ in time.
Even in the restricted case of PCA, known algorithms only recover the sparse vectors for $lambda geq tildemathcalO(k cdot r) while our algorithms require $lambda ge
- Score: 1.90365714903665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of sparse tensor principal component analysis: given a
tensor $\pmb Y = \pmb W + \lambda x^{\otimes p}$ with $\pmb W \in
\otimes^p\mathbb{R}^n$ having i.i.d. Gaussian entries, the goal is to recover
the $k$-sparse unit vector $x \in \mathbb{R}^n$. The model captures both sparse
PCA (in its Wigner form) and tensor PCA.
For the highly sparse regime of $k \leq \sqrt{n}$, we present a family of
algorithms that smoothly interpolates between a simple polynomial-time
algorithm and the exponential-time exhaustive search algorithm. For any $1 \leq
t \leq k$, our algorithms recovers the sparse vector for signal-to-noise ratio
$\lambda \geq \tilde{\mathcal{O}} (\sqrt{t} \cdot (k/t)^{p/2})$ in time
$\tilde{\mathcal{O}}(n^{p+t})$, capturing the state-of-the-art guarantees for
the matrix settings (in both the polynomial-time and sub-exponential time
regimes).
Our results naturally extend to the case of $r$ distinct $k$-sparse signals
with disjoint supports, with guarantees that are independent of the number of
spikes. Even in the restricted case of sparse PCA, known algorithms only
recover the sparse vectors for $\lambda \geq \tilde{\mathcal{O}}(k \cdot r)$
while our algorithms require $\lambda \geq \tilde{\mathcal{O}}(k)$.
Finally, by analyzing the low-degree likelihood ratio, we complement these
algorithmic results with rigorous evidence illustrating the trade-offs between
signal-to-noise ratio and running time. This lower bound captures the known
lower bounds for both sparse PCA and tensor PCA. In this general model, we
observe a more intricate three-way trade-off between the number of samples $n$,
the sparsity $k$, and the tensor power $p$.
Related papers
- Overcomplete Tensor Decomposition via Koszul-Young Flattenings [63.01248796170617]
We give a new algorithm for decomposing an $n_times n times n_3$ tensor as the sum of a minimal number of rank-1 terms.
We show that an even more general class of degree-$d$s cannot surpass rank $Cn$ for a constant $C = C(d)$.
arXiv Detail & Related papers (2024-11-21T17:41:09Z) - A Sub-Quadratic Time Algorithm for Robust Sparse Mean Estimation [6.853165736531941]
We study the algorithmic problem of sparse mean estimation in the presence of adversarial outliers.
Our main contribution is an algorithm for robust sparse mean estimation which runs in emphsubquadratic time using $mathrmpoly(k,log d,1/epsilon)$ samples.
arXiv Detail & Related papers (2024-03-07T18:23:51Z) - Oja's Algorithm for Streaming Sparse PCA [7.059472280274011]
Oja's algorithm for Streaming Principal Component Analysis (PCA) for $n$ data-points in a $d$ dimensional space achieves the same sin-squared error $O(r_mathsfeff/n)$ as the offline algorithm in $O(d)$ space and $O(nd)$ time.
We show that a simple single-pass procedure that thresholds the output of Oja's algorithm can achieve the minimax error bound under some regularity conditions in $O(d)$ space and $O(nd)$ time.
arXiv Detail & Related papers (2024-02-11T16:36:48Z) - Solving Dense Linear Systems Faster Than via Preconditioning [1.8854491183340518]
We show that our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$.
In particular, our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$.
Our main algorithm can be viewed as a randomized block coordinate descent method.
arXiv Detail & Related papers (2023-12-14T12:53:34Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Sparse PCA Beyond Covariance Thresholding [2.311583680973075]
We show that for every $t ll k$ there exists an algorithm running in time $ncdot dO(t)$ that solves this problem as long as [ gtrsim fracksqrtntsqrtln(2 + td/k2)$.
We provide time algorithms for the sparse planted vector problem that have better guarantees than the state of the art in some regimes.
arXiv Detail & Related papers (2023-02-20T18:45:24Z) - Average-Case Complexity of Tensor Decomposition for Low-Degree
Polynomials [93.59919600451487]
"Statistical-computational gaps" occur in many statistical inference tasks.
We consider a model for random order-3 decomposition where one component is slightly larger in norm than the rest.
We show that tensor entries can accurately estimate the largest component when $ll n3/2$ but fail to do so when $rgg n3/2$.
arXiv Detail & Related papers (2022-11-10T00:40:37Z) - Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor
Decompositions [51.19236668224547]
We study low rank approximation of tensors, focusing on the tensor train and Tucker decompositions.
For tensor train decomposition, we give a bicriteria $(1 + eps)$-approximation algorithm with a small bicriteria rank and $O(q cdot nnz(A))$ running time.
In addition, we extend our algorithm to tensor networks with arbitrary graphs.
arXiv Detail & Related papers (2022-07-15T11:55:09Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - The Average-Case Time Complexity of Certifying the Restricted Isometry
Property [66.65353643599899]
In compressed sensing, the restricted isometry property (RIP) on $M times N$ sensing matrices guarantees efficient reconstruction of sparse vectors.
We investigate the exact average-case time complexity of certifying the RIP property for $Mtimes N$ matrices with i.i.d. $mathcalN(0,1/M)$ entries.
arXiv Detail & Related papers (2020-05-22T16:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.