Nearly Optimal Algorithms with Sublinear Computational Complexity for
Online Kernel Regression
- URL: http://arxiv.org/abs/2306.08320v1
- Date: Wed, 14 Jun 2023 07:39:09 GMT
- Title: Nearly Optimal Algorithms with Sublinear Computational Complexity for
Online Kernel Regression
- Authors: Junfan Li and Shizhong Liao
- Abstract summary: The trade-off between regret and computational cost is a fundamental problem for online kernel regression.
We propose two new algorithms, AOGD-ALD and NONS-ALD, which can keep nearly optimal regret bounds at a sublinear computational complexity.
- Score: 13.510889339096117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The trade-off between regret and computational cost is a fundamental problem
for online kernel regression, and previous algorithms worked on the trade-off
can not keep optimal regret bounds at a sublinear computational complexity. In
this paper, we propose two new algorithms, AOGD-ALD and NONS-ALD, which can
keep nearly optimal regret bounds at a sublinear computational complexity, and
give sufficient conditions under which our algorithms work. Both algorithms
dynamically maintain a group of nearly orthogonal basis used to approximate the
kernel mapping, and keep nearly optimal regret bounds by controlling the
approximate error. The number of basis depends on the approximate error and the
decay rate of eigenvalues of the kernel matrix. If the eigenvalues decay
exponentially, then AOGD-ALD and NONS-ALD separately achieves a regret of
$O(\sqrt{L(f)})$ and $O(\mathrm{d}_{\mathrm{eff}}(\mu)\ln{T})$ at a
computational complexity in $O(\ln^2{T})$. If the eigenvalues decay
polynomially with degree $p\geq 1$, then our algorithms keep the same regret
bounds at a computational complexity in $o(T)$ in the case of $p>4$ and $p\geq
10$, respectively. $L(f)$ is the cumulative losses of $f$ and
$\mathrm{d}_{\mathrm{eff}}(\mu)$ is the effective dimension of the problem. The
two regret bounds are nearly optimal and are not comparable.
Related papers
- Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models [39.33814194788341]
We study the task of learning latent-variable models.
Motivated by such applications, we develop a general efficient algorithm for implicit moment computation.
By leveraging our general algorithm, we obtain the first-time learners for the following models.
arXiv Detail & Related papers (2024-11-23T23:13:24Z) - Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient.
We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z) - Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback [30.337826346496385]
We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback.
We introduce two algorithms that achieve improved regret performance compared to existing approaches.
arXiv Detail & Related papers (2023-10-17T19:43:37Z) - Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing [8.723136784230906]
We propose an optimal iterative scheme for federated transfer learning, where a central planner has access to datasets.
Our objective is to minimize the cumulative deviation of the generated parameters $thetai(t)_t=0T$ across all $T$ iterations.
By leveraging symmetries within the regret-optimal algorithm, we develop a nearly regret $_optimal that runs with $mathcalO(Np2)$ fewer elementary operations.
arXiv Detail & Related papers (2023-09-08T19:17:03Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes.
This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z) - Improved Kernel Alignment Regret Bound for Online Kernel Learning [11.201662566974232]
We propose an algorithm whose regret bound and computational complexity are better than previous results.
If the eigenvalues of the kernel matrix decay exponentially, then our algorithm enjoys a regret of $O(sqrtmathcalA_T)$ at a computational complexity of $O(ln2T)$.
We extend our algorithm to batch learning and obtain a $O(frac1TsqrtmathbbE[mathcalA_T])$ excess risk bound which improves the previous $O
arXiv Detail & Related papers (2022-12-26T02:32:20Z) - Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model.
We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z) - Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Hybrid Stochastic-Deterministic Minibatch Proximal Gradient:
Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.