Related papers: Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor Decompositions

Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor Decompositions

URL: http://arxiv.org/abs/2207.07417v3
Date: Sun, 26 Nov 2023 06:10:27 GMT
Title: Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor Decompositions
Authors: Arvind V. Mahankali, David P. Woodruff, Ziyu Zhang
Abstract summary: We study low rank approximation of tensors, focusing on the tensor train and Tucker decompositions. For tensor train decomposition, we give a bicriteria $(1 + eps)$-approximation algorithm with a small bicriteria rank and $O(q cdot nnz(A))$ running time. In addition, we extend our algorithm to tensor networks with arbitrary graphs.
Score: 51.19236668224547
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study low rank approximation of tensors, focusing on the tensor train and Tucker decompositions, as well as approximations with tree tensor networks and more general tensor networks. For tensor train decomposition, we give a bicriteria $(1 + \eps)$-approximation algorithm with a small bicriteria rank and $O(q \cdot \nnz(A))$ running time, up to lower order terms, which improves over the additive error algorithm of \cite{huber2017randomized}. We also show how to convert the algorithm of \cite{huber2017randomized} into a relative error algorithm, but their algorithm necessarily has a running time of $O(qr^2 \cdot \nnz(A)) + n \cdot \poly(qk/\eps)$ when converted to a $(1 + \eps)$-approximation algorithm with bicriteria rank $r$. To the best of our knowledge, our work is the first to achieve polynomial time relative error approximation for tensor train decomposition. Our key technique is a method for obtaining subspace embeddings with a number of rows polynomial in $q$ for a matrix which is the flattening of a tensor train of $q$ tensors. We extend our algorithm to tree tensor networks. In addition, we extend our algorithm to tensor networks with arbitrary graphs (which we refer to as general tensor networks), by using a result of \cite{ms08_simulating_quantum_tensor_contraction} and showing that a general tensor network of rank $k$ can be contracted to a binary tree network of rank $k^{O(\deg(G)\tw(G))}$, allowing us to reduce to the case of tree tensor networks. Finally, we give new fixed-parameter tractable algorithms for the tensor train, Tucker, and CP decompositions, which are simpler than those of \cite{swz19_tensor_low_rank} since they do not make use of polynomial system solvers. Our technique of Gaussian subspace embeddings with exactly $k$ rows (and thus exponentially small success probability) may be of independent interest.

Related papers

Overcomplete Tensor Decomposition via Koszul-Young Flattenings [63.01248796170617]
We give a new algorithm for decomposing an $n_times n times n_3$ tensor as the sum of a minimal number of rank-1 terms. We show that an even more general class of degree-$d$s cannot surpass rank $Cn$ for a constant $C = C(d)$.
arXiv Detail & Related papers (2024-11-21T17:41:09Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Average-Case Complexity of Tensor Decomposition for Low-Degree Polynomials [93.59919600451487]
"Statistical-computational gaps" occur in many statistical inference tasks. We consider a model for random order-3 decomposition where one component is slightly larger in norm than the rest. We show that tensor entries can accurately estimate the largest component when $ll n3/2$ but fail to do so when $rgg n3/2$.
arXiv Detail & Related papers (2022-11-10T00:40:37Z)
Fast algorithm for overcomplete order-3 tensor decomposition [7.638467133689933]
We develop the first fast spectral algorithm to decompose a random third-order tensor over Rd of rank up to O(d3/2/polylog(d)) Our algorithm can recover all components in time O(d6.05) under the current matrix multiplication time.
arXiv Detail & Related papers (2022-02-14T00:31:34Z)
The Complexity of Sparse Tensor PCA [1.90365714903665]
For any $1 leq leq k$, our algorithms recover the sparse vector for signal-to-noise ratio $lambda geq tildemathcalO (sqrtt cdot (k/t)p/2)$ in time. Even in the restricted case of PCA, known algorithms only recover the sparse vectors for $lambda geq tildemathcalO(k cdot r) while our algorithms require $lambda ge
arXiv Detail & Related papers (2021-06-11T10:57:00Z)
Exact nuclear norm, completion and decomposition for random overcomplete tensors via degree-4 SOS [0.7233897166339269]
We show that simple semidefinite programs inspired by degree $4$ SOS can exactly solve the tensor nuclear norm, tensor decomposition, and tensor completion problems on tensors with random asymmetric components. We show that w.h.p. these semidefinite programs can exactly find the nuclear norm and components of an $(ntimes ntimes n)$-tensor $mathcalT$ with $mleq n3/2/polylog(n)$ random asymmetric components.
arXiv Detail & Related papers (2020-11-18T17:27:36Z)
Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
We show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data. Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
arXiv Detail & Related papers (2020-10-22T00:32:12Z)
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network. We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.