Tensor-on-Tensor Regression: Riemannian Optimization,
Over-parameterization, Statistical-computational Gap, and Their Interplay
- URL: http://arxiv.org/abs/2206.08756v3
- Date: Mon, 15 Jan 2024 22:35:00 GMT
- Title: Tensor-on-Tensor Regression: Riemannian Optimization,
Over-parameterization, Statistical-computational Gap, and Their Interplay
- Authors: Yuetian Luo and Anru R. Zhang
- Abstract summary: We study the tensor-on-tensor regression, where the goal is to connect tensor responses to tensor covariates with a low Tucker rank parameter/matrix.
We propose two methods to cope with the challenge of unknown rank.
We provide the first convergence guarantee for the general tensor-on-tensor regression.
- Score: 9.427635404752936
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the tensor-on-tensor regression, where the goal is to connect tensor
responses to tensor covariates with a low Tucker rank parameter tensor/matrix
without the prior knowledge of its intrinsic rank. We propose the Riemannian
gradient descent (RGD) and Riemannian Gauss-Newton (RGN) methods and cope with
the challenge of unknown rank by studying the effect of rank
over-parameterization. We provide the first convergence guarantee for the
general tensor-on-tensor regression by showing that RGD and RGN respectively
converge linearly and quadratically to a statistically optimal estimate in both
rank correctly-parameterized and over-parameterized settings. Our theory
reveals an intriguing phenomenon: Riemannian optimization methods naturally
adapt to over-parameterization without modifications to their implementation.
We also prove the statistical-computational gap in scalar-on-tensor regression
by a direct low-degree polynomial argument. Our theory demonstrates a "blessing
of statistical-computational gap" phenomenon: in a wide range of scenarios in
tensor-on-tensor regression for tensors of order three or higher, the
computationally required sample size matches what is needed by moderate rank
over-parameterization when considering computationally feasible estimators,
while there are no such benefits in the matrix settings. This shows moderate
rank over-parameterization is essentially "cost-free" in terms of sample size
in tensor-on-tensor regression of order three or higher. Finally, we conduct
simulation studies to show the advantages of our proposed methods and to
corroborate our theoretical findings.
Related papers
- Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition [27.29463801531576]
We study the theoretical and algorithmic aspects of the TT-based ToT regression model.
We propose two algorithms to efficiently find solutions to constrained error bounds.
We establish the linear convergence rate of both IHT and RGD.
arXiv Detail & Related papers (2024-06-10T03:51:38Z) - Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery [30.876260188209105]
We provide the first convergence guarantee for the factorization approach.
We optimize over the so-called left-orthogonal TT format.
We prove that RGD can reliably recover the ground truth at a linear rate.
arXiv Detail & Related papers (2024-01-05T01:17:16Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation
from Incomplete Measurements [30.395874385570007]
A fundamental task is to faithfully recover tensors from highly incomplete measurements.
We develop an algorithm to directly recover the tensor factors in the Tucker decomposition.
We show that it provably converges at a linear independent rate of the ground truth tensor for two canonical problems.
arXiv Detail & Related papers (2021-04-29T17:44:49Z) - Low-rank Tensor Estimation via Riemannian Gauss-Newton: Statistical
Optimality and Second-Order Convergence [3.8073142980733]
We consider the estimation of a low Tucker rank tensor from a number of noisy linear measurements.
Different from the generic (super)linear convergence guarantee of RGN in the literature, we prove the first local quadratic convergence guarantee of RGN.
A deterministic estimation error lower bound, which matches the upper bound, is provided.
arXiv Detail & Related papers (2021-04-24T22:24:14Z) - Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$.
The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - An Optimal Statistical and Computational Framework for Generalized
Tensor Estimation [10.899518267165666]
This paper describes a flexible framework for low-rank tensor estimation problems.
It includes many important instances from applications in computational imaging, genomics, and network analysis.
arXiv Detail & Related papers (2020-02-26T01:54:35Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.