A hierarchical Vovk-Azoury-Warmuth forecaster with discounting for online regression in RKHS
- URL: http://arxiv.org/abs/2506.22631v1
- Date: Fri, 27 Jun 2025 20:47:52 GMT
- Title: A hierarchical Vovk-Azoury-Warmuth forecaster with discounting for online regression in RKHS
- Authors: Dmitry B. Rokhlin,
- Abstract summary: We study the problem of online regression with the unconstrained quadratic loss against a time-varying sequence of functions from a Reproducing Kernel Hilbert Space (RKHS)<n>We propose a fully adaptive, hierarchical algorithm, which learns both the discount factor and the number of random features.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of online regression with the unconstrained quadratic loss against a time-varying sequence of functions from a Reproducing Kernel Hilbert Space (RKHS). Recently, Jacobsen and Cutkosky (2024) introduced a discounted Vovk-Azoury-Warmuth (DVAW) forecaster that achieves optimal dynamic regret in the finite-dimensional case. In this work, we lift their approach to the non-parametric domain by synthesizing the DVAW framework with a random feature approximation. We propose a fully adaptive, hierarchical algorithm, which we call H-VAW-D (Hierarchical Vovk-Azoury-Warmuth with Discounting), that learns both the discount factor and the number of random features. We prove that this algorithm, which has a per-iteration computational complexity of $O(T\ln T)$, achieves an expected dynamic regret of $O(T^{2/3}P_T^{1/3} + \sqrt{T}\ln T)$, where $P_T$ is the functional path length of a comparator sequence.
Related papers
- Dynamic Regret Reduces to Kernelized Static Regret [63.36965242404415]
We study dynamic regret in online convex optimization, where the objective is to achieve low cumulative loss relative to an arbitrary benchmark sequence.<n>By constructing a suitable function space in the form of a Reproducing Kernel Hilbert Space (RKHS), our reduction enables us to recover the optimal $R_T(u_1,ldots,u_T) = mathcalO(sqrtsum_t|u_t-u_t-u_t-1|T)$ dynamic regret guarantee.
arXiv Detail & Related papers (2025-07-07T21:09:33Z) - Random feature-based double Vovk-Azoury-Warmuth algorithm for online multi-kernel learning [0.0]
We introduce a novel multi-kernel learning algorithm, VAW$2$, for online least squares regression in reproducing kernel Hilbert spaces (RKHS)<n>VAW$2$ leverages random Fourier feature-based functional approximation and the Vovk-Azoury-Warmuth (VAW) method in a two-level procedure.<n>A theoretical analysis yields a regret bound of $O(T1/2ln T)$ in expectation with respect to artificial randomness, when the number of random features scales as $T1/2$.
arXiv Detail & Related papers (2025-03-25T21:57:35Z) - Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning [13.429541377715296]
We propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in infinite-horizon discounted linear Markov decision processes.<n>We show that, combined with a regularized approximate dynamic-programming scheme, the resulting algorithm achieves a regret of order $tildemathcalO (sqrtd3 (1 - gamma)- 7 / 2 T)$, where $T$ is the total number of sample transitions, $gamma in (0,1)$ is the discount factor, and $d$ is the feature dimensionality.
arXiv Detail & Related papers (2025-02-19T17:32:35Z) - Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation [22.652143194356864]
We address the problem of solving strongly convex and smooth problems using gradient descent (SGD) with a constant step size.<n>We provide an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations $n$.<n>Our analysis relies on the properties of the SGDs viewed as a time-homogeneous Markov chain.
arXiv Detail & Related papers (2024-10-07T15:02:48Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - Kernel $ε$-Greedy for Multi-Armed Bandits with Covariates [5.115048067424624]
We estimate the unknown mean reward functions using an online weighted kernel ridge regression estimator.<n>We show that for any choice of kernel and the corresponding RKHS, we achieve a sub-linear regret rate depending on the intrinsic dimensionality of the RKHS.
arXiv Detail & Related papers (2023-06-29T22:48:34Z) - Dynamic Regret for Strongly Adaptive Methods and Optimality of Online
KRR [13.165557713537389]
We show that Strongly Adaptive (SA) algorithms can be viewed as a principled way of controlling dynamic regret.
We derive a new lower bound on a certain penalized regret which establishes the near minimax optimality of online Kernel Ridge Regression (KRR)
arXiv Detail & Related papers (2021-11-22T21:52:47Z) - Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$.
The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.