Related papers: Logarithmic Regret and Polynomial Scaling in Online Multi-step-ahead Prediction

Logarithmic Regret and Polynomial Scaling in Online Multi-step-ahead Prediction

URL: http://arxiv.org/abs/2511.12467v1
Date: Sun, 16 Nov 2025 05:49:44 GMT
Title: Logarithmic Regret and Polynomial Scaling in Online Multi-step-ahead Prediction
Authors: Jiachen Qian, Yang Zheng,
Abstract summary: We study the problem of online multi-step-ahead prediction for unknown linear systems.<n>Using conditional distribution theory, we derive an optimal parameterization of the prediction policy as a linear function of future inputs, past inputs, and past outputs.<n>We show that the online algorithm achieves logarithmic regret with respect to the optimal Kalman filter in the multi-step setting.
Score: 4.42171635795004
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This letter studies the problem of online multi-step-ahead prediction for unknown linear stochastic systems. Using conditional distribution theory, we derive an optimal parameterization of the prediction policy as a linear function of future inputs, past inputs, and past outputs. Based on this characterization, we propose an online least-squares algorithm to learn the policy and analyze its regret relative to the optimal model-based predictor. We show that the online algorithm achieves logarithmic regret with respect to the optimal Kalman filter in the multi-step setting. Furthermore, with new proof techniques, we establish an almost-sure regret bound that does not rely on fixed failure probabilities for sufficiently large horizons $N$. Finally, our analysis also reveals that, while the regret remains logarithmic in $N$, its constant factor grows polynomially with the prediction horizon $H$, with the polynomial order set by the largest Jordan block of eigenvalue 1 in the system matrix.

Related papers

A Polynomial-time Algorithm for Online Sparse Linear Regression with Improved Regret Bound under Weaker Conditions [75.69959433669244]
We study the problem of online sparse linear regression (OSLR) where the algorithms are restricted to accessing only $k$ out of $d$ per instance for prediction.<n>We introduce a new extend-time algorithm, which significantly improves previous regret bounds.
arXiv Detail & Related papers (2025-10-31T05:02:33Z)
Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update [70.38810219913593]
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function.<n>GLBs are widely applicable to real-world scenarios, but their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency.<n>We propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcalO(1)$ time and space complexities per round.
arXiv Detail & Related papers (2025-07-16T02:24:21Z)
Follow The Approximate Sparse Leader for No-Regret Online Sparse Linear Approximation [0.46040036610482665]
We consider the problem of textitonline sparse linear approximation, where one predicts the best sparse approximation of a sequence of measurements in terms of linear combination of columns of a given measurement matrix.<n>We propose Follow-The-Approximate-Sparse-Leader, an efficient online meta-policy to address this online problem.
arXiv Detail & Related papers (2025-01-01T10:50:35Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
Online Contextual Decision-Making with a Smart Predict-then-Optimize Method [4.061135251278187]
We study an online contextual decision-making problem with resource constraints. We propose an algorithm that mixes a prediction step based on the "Smart Predict-then- (SPO)" method with a dual update step based on mirror descent. We prove regret bounds and demonstrate that the overall convergence rate of our method depends on the $mathcalO(T-1/2)$ convergence of online mirror descent.
arXiv Detail & Related papers (2022-06-15T06:16:13Z)
Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z)
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs) We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
Online Learning of the Kalman Filter with Logarithmic Regret [2.0305676256390934]
We show that it is possible to achieve a regret of the order of $mathrmpolylog(N)$ with high probability, where $N$ is the number of observations collected. This is achieved using an online least-squares algorithm, which exploits the approximately linear relation between future observations and past observations.
arXiv Detail & Related papers (2020-02-12T18:31:31Z)
No-Regret Prediction in Marginally Stable Systems [37.178095559618654]
We consider the problem of online prediction in a marginally stable linear dynamical system. By applying our techniques to learning an autoregressive filter, we also achieve logarithmic regret in the partially observed setting.
arXiv Detail & Related papers (2020-02-06T01:53:34Z)
Toward Optimal Adversarial Policies in the Multiplicative Learning System with a Malicious Expert [87.12201611818698]
We consider a learning system that combines experts' advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. We show that a simple greedy policy of always reporting false prediction is optimal with an approximation ratio of $1+O(sqrtfracln NN)$. For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in $O(N3)$.
arXiv Detail & Related papers (2020-01-02T18:04:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.