Subquadratic Kronecker Regression with Applications to Tensor
Decomposition
- URL: http://arxiv.org/abs/2209.04876v2
- Date: Fri, 12 May 2023 05:19:44 GMT
- Title: Subquadratic Kronecker Regression with Applications to Tensor
Decomposition
- Authors: Matthew Fahrbach, Thomas Fu, Mehrdad Ghadiri
- Abstract summary: We present the first subquadratic-time algorithm for solving Kronecker regression to a $(varepsilon-N)$-approximation.
Our techniques combine leverage score sampling and iterative methods.
We demonstrate the speed and accuracy of this Kronecker regression algorithm on synthetic data and real-world image tensors.
- Score: 4.391912559062643
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kronecker regression is a highly-structured least squares problem
$\min_{\mathbf{x}} \lVert \mathbf{K}\mathbf{x} - \mathbf{b} \rVert_{2}^2$,
where the design matrix $\mathbf{K} = \mathbf{A}^{(1)} \otimes \cdots \otimes
\mathbf{A}^{(N)}$ is a Kronecker product of factor matrices. This regression
problem arises in each step of the widely-used alternating least squares (ALS)
algorithm for computing the Tucker decomposition of a tensor. We present the
first subquadratic-time algorithm for solving Kronecker regression to a
$(1+\varepsilon)$-approximation that avoids the exponential term
$O(\varepsilon^{-N})$ in the running time. Our techniques combine leverage
score sampling and iterative methods. By extending our approach to block-design
matrices where one block is a Kronecker product, we also achieve
subquadratic-time algorithms for (1) Kronecker ridge regression and (2)
updating the factor matrices of a Tucker decomposition in ALS, which is not a
pure Kronecker regression problem, thereby improving the running time of all
steps of Tucker ALS. We demonstrate the speed and accuracy of this Kronecker
regression algorithm on synthetic data and real-world image tensors.
Related papers
- Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient.
We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z) - Regression for matrix-valued data via Kronecker products factorization [0.5156484100374059]
We propose an estimation algorithm, termed KRO-PRO-FAC, for estimating the parameters $beta_1k subset Rep times q_1$ and $beta_2k subset Rep times q$.
Numerical studies on simulated and real data indicate that our procedure is competitive, in terms of both estimation error and predictive accuracy, compared to other existing methods.
arXiv Detail & Related papers (2024-04-30T02:44:41Z) - How to Inverting the Leverage Score Distribution? [16.744561210470632]
Despite leverage scores being widely used as a tool, in this paper, we study a novel problem, namely the inverting leverage score.
We use iterative shrinking and the induction hypothesis to ensure global convergence rates for the Newton method.
This important study on inverting statistical leverage opens up numerous new applications in interpretation, data recovery, and security.
arXiv Detail & Related papers (2024-04-21T21:36:42Z) - Average-Case Complexity of Tensor Decomposition for Low-Degree
Polynomials [93.59919600451487]
"Statistical-computational gaps" occur in many statistical inference tasks.
We consider a model for random order-3 decomposition where one component is slightly larger in norm than the rest.
We show that tensor entries can accurately estimate the largest component when $ll n3/2$ but fail to do so when $rgg n3/2$.
arXiv Detail & Related papers (2022-11-10T00:40:37Z) - Private Isotonic Regression [54.32252900997422]
We consider the problem of isotonic regression over a partially ordered set (poset) $mathcalX$ and for any Lipschitz loss function.
We obtain a pure-DP algorithm that has an expected excess empirical risk of roughly $mathrmwidth(mathcalX) cdot log|mathcalX| / n$, where $mathrmwidth(mathcalX)$ is the width of the poset.
We show that the bounds above are essentially the best that can be
arXiv Detail & Related papers (2022-10-27T05:08:07Z) - Dynamic Tensor Product Regression [18.904243654649118]
We present a dynamic tree data structure where any update to a single matrix can be propagated quickly.
We also show that our data structure can be used to solve dynamic versions of Product Regression, but also Product Spline regression.
arXiv Detail & Related papers (2022-10-08T08:06:00Z) - Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor
Decompositions [51.19236668224547]
We study low rank approximation of tensors, focusing on the tensor train and Tucker decompositions.
For tensor train decomposition, we give a bicriteria $(1 + eps)$-approximation algorithm with a small bicriteria rank and $O(q cdot nnz(A))$ running time.
In addition, we extend our algorithm to tensor networks with arbitrary graphs.
arXiv Detail & Related papers (2022-07-15T11:55:09Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Fast Low-Rank Tensor Decomposition by Ridge Leverage Score Sampling [5.740578698172382]
We study Tucker decompositions and use tools from randomized numerical linear algebra called ridge leverage scores.
We show how to use approximate ridge leverage scores to construct a sketched instance for any ridge regression problem.
We demonstrate the effectiveness of our approximate ridge regressioni algorithm for large, low-rank Tucker decompositions on both synthetic and real-world data.
arXiv Detail & Related papers (2021-07-22T13:32:47Z) - Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix
Factorization [49.090785356633695]
We study the asymmetric low-rank factorization problem: [mathbfU in mathbbRm min d, mathbfU$ and mathV$.
arXiv Detail & Related papers (2021-06-27T17:25:24Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.