Related papers: Sparsity-Based Interpolation of External, Internal and Swap Regret

Sparsity-Based Interpolation of External, Internal and Swap Regret

URL: http://arxiv.org/abs/2502.04543v1
Date: Thu, 06 Feb 2025 22:47:52 GMT
Title: Sparsity-Based Interpolation of External, Internal and Swap Regret
Authors: Zhou Lu, Y. Jennifer Sun, Zhiyu Zhang,
Abstract summary: This paper studies the comparator of several performance metrics via $phi$-regret minimization.<n>We present a single algorithm achieving the instance-adaptive $phi$-regret bound.<n>Building on the classical reduction from $phi$-regret minimization to external regret minimization, our main idea is to further convert the latter to online linear regression.
Score: 4.753557469026313
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Focusing on the expert problem in online learning, this paper studies the interpolation of several performance metrics via $\phi$-regret minimization, which measures the performance of an algorithm by its regret with respect to an arbitrary action modification rule $\phi$. With $d$ experts and $T\gg d$ rounds in total, we present a single algorithm achieving the instance-adaptive $\phi$-regret bound \begin{equation*} \tilde O\left(\min\left\{\sqrt{d-d^{\mathrm{unif}}_\phi+1},\sqrt{d-d^{\mathrm{self}}_\phi}\right\}\cdot\sqrt{T}\right), \end{equation*} where $d^{\mathrm{unif}}_\phi$ is the maximum amount of experts modified identically by $\phi$, and $d^{\mathrm{self}}_\phi$ is the amount of experts that $\phi$ trivially modifies to themselves. By recovering the optimal $O(\sqrt{T\log d})$ external regret bound when $d^{\mathrm{unif}}_\phi=d$, the standard $\tilde O(\sqrt{T})$ internal regret bound when $d^{\mathrm{self}}_\phi=d-1$ and the optimal $\tilde O(\sqrt{dT})$ swap regret bound in the worst case, we improve existing results in the intermediate regimes. In addition, the same algorithm achieves the optimal quantile regret bound, which corresponds to even easier settings of $\phi$ than the external regret. Building on the classical reduction from $\phi$-regret minimization to external regret minimization on stochastic matrices, our main idea is to further convert the latter to online linear regression using Haar-wavelet-inspired matrix features. Then, we apply a particular $L_1$-version of comparator-adaptive online learning algorithms to exploit the sparsity in this regression subroutine.

Related papers

High-Dimensional Calibration from Swap Regret [40.9736612423411]
We study the online calibration of multi-dimensional forecasts over an arbitrary convex set $mathcalP subset mathbbRd$.<n>We show that if it is possible to guarantee $O(sqrtrho T)$ worst-case regret after $T$ rounds, it is possible to obtain $epsilon$-calibrated forecasts after $T = exp(logd/epsilon2).
arXiv Detail & Related papers (2025-05-27T17:31:47Z)
Nearly Optimal Algorithms with Sublinear Computational Complexity for Online Kernel Regression [13.510889339096117]
The trade-off between regret and computational cost is a fundamental problem for online kernel regression. We propose two new algorithms, AOGD-ALD and NONS-ALD, which can keep nearly optimal regret bounds at a sublinear computational complexity.
arXiv Detail & Related papers (2023-06-14T07:39:09Z)
Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime [74.52487417350221]
We consider online learning problems in the realizable setting, where there is a zero-loss solution. We propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds.
arXiv Detail & Related papers (2023-02-27T21:19:24Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Improved Regret Bounds for Online Submodular Maximization [10.089520556398575]
We consider an online optimization problem where at each step $tin[T]$, the algorithm chooses an action $x_t$ from the fixed convex and compact domain set $mathcalK$. A utility function $f_t(cdot)$ is then revealed and the algorithm receives the payoff $f_t(x_t)$.
arXiv Detail & Related papers (2021-06-15T02:05:35Z)
Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems. We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$ We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors. We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
Adaptive Online Learning with Varying Norms [45.11667443216861]
We provide an online convex optimization algorithm that outputs points $w_t$ in some domain $W$. We apply this result to obtain new "full-matrix"-style regret bounds.
arXiv Detail & Related papers (2020-02-10T17:22:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.