Online Tensor Learning: Computational and Statistical Trade-offs,
Adaptivity and Optimal Regret
- URL: http://arxiv.org/abs/2306.03372v2
- Date: Mon, 10 Jul 2023 05:51:12 GMT
- Title: Online Tensor Learning: Computational and Statistical Trade-offs,
Adaptivity and Optimal Regret
- Authors: Jian-Feng Cai, Jingyang Li and Dong Xia
- Abstract summary: We investigate a framework for estimating latent low-rank tensors in an online setting, encompassing both linear and generalized linear models.
We also investigate two specific applications: online tensor completion and online binary tensor learning.
Notably, our work represents the first attempt to incorporate noise in the online low-rank tensor recovery task.
- Score: 17.29570708667132
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate a generalized framework for estimating latent low-rank tensors
in an online setting, encompassing both linear and generalized linear models.
This framework offers a flexible approach for handling continuous or
categorical variables. Additionally, we investigate two specific applications:
online tensor completion and online binary tensor learning. To address these
challenges, we propose the online Riemannian gradient descent algorithm, which
demonstrates linear convergence and the ability to recover the low-rank
component under appropriate conditions in all applications. Furthermore, we
establish a precise entry-wise error bound for online tensor completion.
Notably, our work represents the first attempt to incorporate noise in the
online low-rank tensor recovery task. Intriguingly, we observe a surprising
trade-off between computational and statistical aspects in the presence of
noise. Increasing the step size accelerates convergence but leads to higher
statistical error, whereas a smaller step size yields a statistically optimal
estimator at the expense of slower convergence. Moreover, we conduct regret
analysis for online tensor regression. Under the fixed step size regime, a
fascinating trilemma concerning the convergence rate, statistical error rate,
and regret is observed. With an optimal choice of step size we achieve an
optimal regret of $O(\sqrt{T})$. Furthermore, we extend our analysis to the
adaptive setting where the horizon T is unknown. In this case, we demonstrate
that by employing different step sizes, we can attain a statistically optimal
error rate along with a regret of $O(\log T)$. To validate our theoretical
claims, we provide numerical results that corroborate our findings and support
our assertions.
Related papers
- A Pathwise Coordinate Descent Algorithm for LASSO Penalized Quantile Regression [0.6445605125467572]
We develop a fast, pathwise coordinate descent algorithm to compute exact penalized quantile regression estimates for high-dimensional data.
Our algorithm runs substantially faster than existing alternatives based on approximate CD and linear program.
arXiv Detail & Related papers (2025-02-17T22:57:41Z) - Approximating Metric Magnitude of Point Sets [4.522729058300309]
Metric magnitude is a measure of the "size" of point clouds with many desirable geometric properties.
It has been adapted to various mathematical contexts and recent work suggests that it can enhance machine learning and optimization algorithms.
In this paper, we study the magnitude problem, and show efficient ways of approximating it. We show that it can be cast as a convex optimization problem, but not as a submodular optimization.
The paper describes two new algorithms - an iterative approximation algorithm that converges fast and is accurate, and a subset selection method that makes the computation even faster.
arXiv Detail & Related papers (2024-09-06T17:15:28Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive
Least-Squares [8.443742714362521]
We develop an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints.
Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA)
Our experiments show the effectiveness of the proposed method compared to the baselines.
arXiv Detail & Related papers (2022-07-28T02:01:31Z) - Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm
Regularization [44.54772242784423]
We develop an efficient non regularization algorithm for low-rank learning matrix.
The proposed algorithm can avoid expensive folding/unfolding problems.
Experiments show that the proposed algorithm is efficient and more space than the existing state-of-the-world.
arXiv Detail & Related papers (2022-05-06T07:47:10Z) - Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates.
We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor.
Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - Scalable Optimal Transport in High Dimensions for Graph Distances,
Embedding Alignment, and More [7.484063729015126]
We propose two effective log-linear time approximations of the cost matrix for optimal transport.
These approximations enable general log-linear time algorithms for entropy-regularized OT that perform well even for the complex, high-dimensional spaces.
For graph distance regression we propose the graph transport network (GTN), which combines graph neural networks (GNNs) with enhanced Sinkhorn.
arXiv Detail & Related papers (2021-07-14T17:40:08Z) - Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting.
We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.