Related papers: Improved Regret Bounds for Online Kernel Selection under Bandit Feedback

Improved Regret Bounds for Online Kernel Selection under Bandit Feedback

URL: http://arxiv.org/abs/2303.05018v2
Date: Thu, 23 Mar 2023 07:20:34 GMT
Title: Improved Regret Bounds for Online Kernel Selection under Bandit Feedback
Authors: Junfan Li and Shizhong Liao
Abstract summary: We prove two types of regret bounds improving the previous bound. We apply the two algorithms to online kernel selection with time and prove new regret bounds matching or improving the previous $O(sqrtTlnK +Vert fVert2_mathcalH_imaxsqrtT,fracTsqrtmathcalR)$ expected bound where $mathcalR$ is the time budget.
Score: 13.510889339096117
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a $O((\Vert f\Vert^2_{\mathcal{H}_i}+1)K^{\frac{1}{3}}T^{\frac{2}{3}})$ expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a $O(U^{\frac{2}{3}}K^{-\frac{1}{3}}(\sum^K_{i=1}L_T(f^\ast_i))^{\frac{2}{3}})$ expected bound where $L_T(f^\ast_i)$ is the cumulative losses of optimal hypothesis in $\mathbb{H}_{i}=\{f\in\mathcal{H}_i:\Vert f\Vert_{\mathcal{H}_i}\leq U\}$. The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a $O(U\sqrt{KT}\ln^{\frac{2}{3}}{T})$ expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous $O(\sqrt{T\ln{K}} +\Vert f\Vert^2_{\mathcal{H}_i}\max\{\sqrt{T},\frac{T}{\sqrt{\mathcal{R}}}\})$ expected bound where $\mathcal{R}$ is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.

Related papers

Optimal and Efficient Algorithms for Decentralized Online Convex Optimization [51.00357162913229]
Decentralized online convex optimization (D-OCO) is designed to minimize a sequence of global loss functions using only local computations and communications. We develop a novel D-OCO algorithm that can reduce the regret bounds for convex and strongly convex functions to $tildeO(nrho-1/4sqrtT)$ and $tildeO(nrho-1/2log T)$. Our analysis reveals that the projection-free variant can achieve $O(nT3/4)$ and $O(n
arXiv Detail & Related papers (2024-02-14T13:44:16Z)
Improved Kernel Alignment Regret Bound for Online Kernel Learning [11.201662566974232]
We propose an algorithm whose regret bound and computational complexity are better than previous results. If the eigenvalues of the kernel matrix decay exponentially, then our algorithm enjoys a regret of $O(sqrtmathcalA_T)$ at a computational complexity of $O(ln2T)$. We extend our algorithm to batch learning and obtain a $O(frac1TsqrtmathbbE[mathcalA_T])$ excess risk bound which improves the previous $O
arXiv Detail & Related papers (2022-12-26T02:32:20Z)
Eluder-based Regret for Stochastic Contextual MDPs [43.19667415823089]
We present the E-UC$3$RL algorithm for regret minimization in Contextual Markov Decision Processes (CMDPs) Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys a regret guarantee of $ widetildeO(H3 sqrtT |S| |A|d_mathrmE(mathcalP)$.
arXiv Detail & Related papers (2022-11-27T20:38:47Z)
Private Isotonic Regression [54.32252900997422]
We consider the problem of isotonic regression over a partially ordered set (poset) $mathcalX$ and for any Lipschitz loss function. We obtain a pure-DP algorithm that has an expected excess empirical risk of roughly $mathrmwidth(mathcalX) cdot log|mathcalX| / n$, where $mathrmwidth(mathcalX)$ is the width of the poset. We show that the bounds above are essentially the best that can be
arXiv Detail & Related papers (2022-10-27T05:08:07Z)
The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization [88.91190483500932]
In this paper, we revisit the smooth and strongly-strongly-concave minimax optimization problem. Existing state-of-the-art methods do not match lower bound $Omegaleft(sqrtkappa_xkappa_ylog3 (kappa_xkappa_y)logfrac1epsilonright)$. We fix this fundamental issue by providing the first algorithm with $mathcalOleft( sqrtkappa_xkappa_ylog
arXiv Detail & Related papers (2022-05-11T17:33:07Z)
Differentially Private Generalized Linear Models Revisited [30.298342283075176]
We study the problem of $(epsilon,delta)$-differentially private learning of linear predictors with convex losses. In particular, we show a lower bound on the excess population risk of $tildeOleft(fracVert w*Vertsqrtn + minleftfracVert w*Vert4/3(nepsilon)2/3, fracsqrtdVert w*Vertnepsilon
arXiv Detail & Related papers (2022-05-06T05:01:02Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Improved Regret Bounds for Online Submodular Maximization [10.089520556398575]
We consider an online optimization problem where at each step $tin[T]$, the algorithm chooses an action $x_t$ from the fixed convex and compact domain set $mathcalK$. A utility function $f_t(cdot)$ is then revealed and the algorithm receives the payoff $f_t(x_t)$.
arXiv Detail & Related papers (2021-06-15T02:05:35Z)
Linear Bandits on Uniformly Convex Sets [88.3673525964507]
Linear bandit algorithms yield $tildemathcalO(nsqrtT)$ pseudo-regret bounds on compact convex action sets. Two types of structural assumptions lead to better pseudo-regret bounds.
arXiv Detail & Related papers (2021-03-10T07:33:03Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$ We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors. We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems. We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)
Adaptive Online Learning with Varying Norms [45.11667443216861]
We provide an online convex optimization algorithm that outputs points $w_t$ in some domain $W$. We apply this result to obtain new "full-matrix"-style regret bounds.
arXiv Detail & Related papers (2020-02-10T17:22:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.