Related papers: Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition

Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition

URL: http://arxiv.org/abs/2406.06002v1
Date: Mon, 10 Jun 2024 03:51:38 GMT
Title: Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition
Authors: Zhen Qin, Zhihui Zhu,
Abstract summary: We study the theoretical and algorithmic aspects of the TT-based ToT regression model. We propose two algorithms to efficiently find solutions to constrained error bounds. We establish the linear convergence rate of both IHT and RGD.
Score: 27.29463801531576
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (TT)-based ToT model proving efficient in practice due to reduced memory requirements, enhanced computational efficiency, and decreased sampling complexity. Despite these practical benefits, a disparity exists between theoretical analysis and real-world performance. In this paper, we delve into the theoretical and algorithmic aspects of the TT-based ToT regression model. Assuming the regression operator satisfies the restricted isometry property (RIP), we conduct an error analysis for the solution to a constrained least-squares optimization problem. This analysis includes upper error bound and minimax lower bound, revealing that such error bounds polynomially depend on the order $N+M$. To efficiently find solutions meeting such error bounds, we propose two optimization algorithms: the iterative hard thresholding (IHT) algorithm (employing gradient descent with TT-singular value decomposition (TT-SVD)) and the factorization approach using the Riemannian gradient descent (RGD) algorithm. When RIP is satisfied, spectral initialization facilitates proper initialization, and we establish the linear convergence rate of both IHT and RGD.

Related papers

A Scalable Factorization Approach for High-Order Structured Tensor Recovery [30.876260188209105]
decompositions, which represent an $N$-order tensor using approximately $N$ factors of much smaller dimensions, can significantly reduce the number of parameters.<n>A computationally memory-efficient approach to these problems is to optimize directly over factors using local algorithms.<n>We present a unified framework for factorization approach to solving various tensor decomposition problems.
arXiv Detail & Related papers (2025-06-19T05:07:07Z)
TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training [91.8932638236073]
We introduce textbfTensorGRaD, a novel method that directly addresses the memory challenges associated with large-structured weights.<n>We show that sparseGRaD reduces total memory usage by over $50%$ while maintaining and sometimes even improving accuracy.
arXiv Detail & Related papers (2025-01-04T20:51:51Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Multi-Grid Tensorized Fourier Neural Operator for High-Resolution PDEs [93.82811501035569]
We introduce a new data efficient and highly parallelizable operator learning approach with reduced memory requirement and better generalization. MG-TFNO scales to large resolutions by leveraging local and global structures of full-scale, real-world phenomena. We demonstrate superior performance on the turbulent Navier-Stokes equations where we achieve less than half the error with over 150x compression.
arXiv Detail & Related papers (2023-09-29T20:18:52Z)
An Optimization-based Deep Equilibrium Model for Hyperspectral Image Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem. A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network. The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay [9.427635404752936]
We study the tensor-on-tensor regression, where the goal is to connect tensor responses to tensor covariates with a low Tucker rank parameter/matrix. We propose two methods to cope with the challenge of unknown rank. We provide the first convergence guarantee for the general tensor-on-tensor regression.
arXiv Detail & Related papers (2022-06-17T13:15:27Z)
Truncated tensor Schatten p-norm based approach for spatiotemporal traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers. Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z)
Scalable Spatiotemporally Varying Coefficient Modelling with Bayesian Kernelized Tensor Regression [17.158289775348063]
Kernelized tensor Regression (BKTR) can be considered a new and scalable approach to modeling processes with low-rank cotemporal structure. We conduct extensive experiments on both synthetic and real-world data sets, and our results confirm the superior performance and efficiency of BKTR for model estimation and inference.
arXiv Detail & Related papers (2021-08-31T19:22:23Z)
Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements [30.395874385570007]
A fundamental task is to faithfully recover tensors from highly incomplete measurements. We develop an algorithm to directly recover the tensor factors in the Tucker decomposition. We show that it provably converges at a linear independent rate of the ground truth tensor for two canonical problems.
arXiv Detail & Related papers (2021-04-29T17:44:49Z)
Alternating minimization algorithms for graph regularized tensor completion [8.26185178671935]
We consider a Canonical Polyadic (CP) decomposition approach to low-rank tensor completion (LRTC) The usage of graph regularization entails benefits in the learning accuracy of LRTC, but at the same time, induces coupling graph Laplacian terms. We propose efficient alternating minimization algorithms by leveraging the block structure of the underlying CP decomposition-based model.
arXiv Detail & Related papers (2020-08-28T23:20:49Z)
An Optimal Statistical and Computational Framework for Generalized Tensor Estimation [10.899518267165666]
This paper describes a flexible framework for low-rank tensor estimation problems. It includes many important instances from applications in computational imaging, genomics, and network analysis.
arXiv Detail & Related papers (2020-02-26T01:54:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.