Online Tensor Learning: Computational and Statistical Trade-offs,
Adaptivity and Optimal Regret
- URL: http://arxiv.org/abs/2306.03372v2
- Date: Mon, 10 Jul 2023 05:51:12 GMT
- Title: Online Tensor Learning: Computational and Statistical Trade-offs,
Adaptivity and Optimal Regret
- Authors: Jian-Feng Cai, Jingyang Li and Dong Xia
- Abstract summary: We investigate a framework for estimating latent low-rank tensors in an online setting, encompassing both linear and generalized linear models.
We also investigate two specific applications: online tensor completion and online binary tensor learning.
Notably, our work represents the first attempt to incorporate noise in the online low-rank tensor recovery task.
- Score: 17.29570708667132
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate a generalized framework for estimating latent low-rank tensors
in an online setting, encompassing both linear and generalized linear models.
This framework offers a flexible approach for handling continuous or
categorical variables. Additionally, we investigate two specific applications:
online tensor completion and online binary tensor learning. To address these
challenges, we propose the online Riemannian gradient descent algorithm, which
demonstrates linear convergence and the ability to recover the low-rank
component under appropriate conditions in all applications. Furthermore, we
establish a precise entry-wise error bound for online tensor completion.
Notably, our work represents the first attempt to incorporate noise in the
online low-rank tensor recovery task. Intriguingly, we observe a surprising
trade-off between computational and statistical aspects in the presence of
noise. Increasing the step size accelerates convergence but leads to higher
statistical error, whereas a smaller step size yields a statistically optimal
estimator at the expense of slower convergence. Moreover, we conduct regret
analysis for online tensor regression. Under the fixed step size regime, a
fascinating trilemma concerning the convergence rate, statistical error rate,
and regret is observed. With an optimal choice of step size we achieve an
optimal regret of $O(\sqrt{T})$. Furthermore, we extend our analysis to the
adaptive setting where the horizon T is unknown. In this case, we demonstrate
that by employing different step sizes, we can attain a statistically optimal
error rate along with a regret of $O(\log T)$. To validate our theoretical
claims, we provide numerical results that corroborate our findings and support
our assertions.
Related papers
- Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition [27.29463801531576]
We study the theoretical and algorithmic aspects of the TT-based ToT regression model.
We propose two algorithms to efficiently find solutions to constrained error bounds.
We establish the linear convergence rate of both IHT and RGD.
arXiv Detail & Related papers (2024-06-10T03:51:38Z) - Online Tensor Inference [0.0]
Traditional offline learning, involving the storage and utilization of all data in each computational iteration, becomes impractical for high-dimensional tensor data.
Existing low-rank tensor methods lack the capability for statistical inference in an online fashion.
Our approach employs Gradient Descent (SGD) to enable efficient real-time data processing without extensive memory requirements.
arXiv Detail & Related papers (2023-12-28T16:37:48Z) - Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$.
Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z) - Tensor-on-Tensor Regression: Riemannian Optimization,
Over-parameterization, Statistical-computational Gap, and Their Interplay [9.427635404752936]
We study the tensor-on-tensor regression, where the goal is to connect tensor responses to tensor covariates with a low Tucker rank parameter/matrix.
We propose two methods to cope with the challenge of unknown rank.
We provide the first convergence guarantee for the general tensor-on-tensor regression.
arXiv Detail & Related papers (2022-06-17T13:15:27Z) - Truncated tensor Schatten p-norm based approach for spatiotemporal
traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers.
Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation
from Incomplete Measurements [30.395874385570007]
A fundamental task is to faithfully recover tensors from highly incomplete measurements.
We develop an algorithm to directly recover the tensor factors in the Tucker decomposition.
We show that it provably converges at a linear independent rate of the ground truth tensor for two canonical problems.
arXiv Detail & Related papers (2021-04-29T17:44:49Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Tensor denoising and completion based on ordinal observations [11.193504036335503]
We consider the problem of low-rank tensor estimation from possibly incomplete, ordinal-valued observations.
We propose a multi-linear cumulative link model, develop a rank-constrained M-estimator, and obtain theoretical accuracy guarantees.
We show that the proposed estimator is minimax optimal under the class of low-rank models.
arXiv Detail & Related papers (2020-02-16T07:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.