Improved Regret Bounds for Online Kernel Selection under Bandit Feedback
- URL: http://arxiv.org/abs/2303.05018v2
- Date: Thu, 23 Mar 2023 07:20:34 GMT
- Title: Improved Regret Bounds for Online Kernel Selection under Bandit Feedback
- Authors: Junfan Li and Shizhong Liao
- Abstract summary: We prove two types of regret bounds improving the previous bound.
We apply the two algorithms to online kernel selection with time and prove new regret bounds matching or improving the previous $O(sqrtTlnK +Vert fVert2_mathcalH_imaxsqrtT,fracTsqrtmathcalR)$ expected bound where $mathcalR$ is the time budget.
- Score: 13.510889339096117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we improve the regret bound for online kernel selection under
bandit feedback. Previous algorithm enjoys a $O((\Vert
f\Vert^2_{\mathcal{H}_i}+1)K^{\frac{1}{3}}T^{\frac{2}{3}})$ expected bound for
Lipschitz loss functions. We prove two types of regret bounds improving the
previous bound. For smooth loss functions, we propose an algorithm with a
$O(U^{\frac{2}{3}}K^{-\frac{1}{3}}(\sum^K_{i=1}L_T(f^\ast_i))^{\frac{2}{3}})$
expected bound where $L_T(f^\ast_i)$ is the cumulative losses of optimal
hypothesis in $\mathbb{H}_{i}=\{f\in\mathcal{H}_i:\Vert
f\Vert_{\mathcal{H}_i}\leq U\}$. The data-dependent bound keeps the previous
worst-case bound and is smaller if most of candidate kernels match well with
the data. For Lipschitz loss functions, we propose an algorithm with a
$O(U\sqrt{KT}\ln^{\frac{2}{3}}{T})$ expected bound asymptotically improving the
previous bound. We apply the two algorithms to online kernel selection with
time constraint and prove new regret bounds matching or improving the previous
$O(\sqrt{T\ln{K}} +\Vert
f\Vert^2_{\mathcal{H}_i}\max\{\sqrt{T},\frac{T}{\sqrt{\mathcal{R}}}\})$
expected bound where $\mathcal{R}$ is the time budget. Finally, we empirically
verify our algorithms on online regression and classification tasks.
Related papers
- Improved Kernel Alignment Regret Bound for Online Kernel Learning [11.201662566974232]
We propose an algorithm whose regret bound and computational complexity are better than previous results.
If the eigenvalues of the kernel matrix decay exponentially, then our algorithm enjoys a regret of $O(sqrtmathcalA_T)$ at a computational complexity of $O(ln2T)$.
We extend our algorithm to batch learning and obtain a $O(frac1TsqrtmathbbE[mathcalA_T])$ excess risk bound which improves the previous $O
arXiv Detail & Related papers (2022-12-26T02:32:20Z) - Eluder-based Regret for Stochastic Contextual MDPs [43.19667415823089]
We present the E-UC$3$RL algorithm for regret minimization in Contextual Markov Decision Processes (CMDPs)
Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys a regret guarantee of $ widetildeO(H3 sqrtT |S| |A|d_mathrmE(mathcalP)$.
arXiv Detail & Related papers (2022-11-27T20:38:47Z) - Private Isotonic Regression [54.32252900997422]
We consider the problem of isotonic regression over a partially ordered set (poset) $mathcalX$ and for any Lipschitz loss function.
We obtain a pure-DP algorithm that has an expected excess empirical risk of roughly $mathrmwidth(mathcalX) cdot log|mathcalX| / n$, where $mathrmwidth(mathcalX)$ is the width of the poset.
We show that the bounds above are essentially the best that can be
arXiv Detail & Related papers (2022-10-27T05:08:07Z) - The First Optimal Algorithm for Smooth and
Strongly-Convex-Strongly-Concave Minimax Optimization [88.91190483500932]
In this paper, we revisit the smooth and strongly-strongly-concave minimax optimization problem.
Existing state-of-the-art methods do not match lower bound $Omegaleft(sqrtkappa_xkappa_ylog3 (kappa_xkappa_y)logfrac1epsilonright)$.
We fix this fundamental issue by providing the first algorithm with $mathcalOleft( sqrtkappa_xkappa_ylog
arXiv Detail & Related papers (2022-05-11T17:33:07Z) - Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model.
We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z) - Improved Regret Bounds for Online Submodular Maximization [10.089520556398575]
We consider an online optimization problem where at each step $tin[T]$, the algorithm chooses an action $x_t$ from the fixed convex and compact domain set $mathcalK$.
A utility function $f_t(cdot)$ is then revealed and the algorithm receives the payoff $f_t(x_t)$.
arXiv Detail & Related papers (2021-06-15T02:05:35Z) - Linear Bandits on Uniformly Convex Sets [88.3673525964507]
Linear bandit algorithms yield $tildemathcalO(nsqrtT)$ pseudo-regret bounds on compact convex action sets.
Two types of structural assumptions lead to better pseudo-regret bounds.
arXiv Detail & Related papers (2021-03-10T07:33:03Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Adaptive Online Learning with Varying Norms [45.11667443216861]
We provide an online convex optimization algorithm that outputs points $w_t$ in some domain $W$.
We apply this result to obtain new "full-matrix"-style regret bounds.
arXiv Detail & Related papers (2020-02-10T17:22:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.