Related papers: Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks

Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks

URL: http://arxiv.org/abs/2205.08234v1
Date: Tue, 17 May 2022 11:12:20 GMT
Title: Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks
Authors: Naresh Manwani, Mudit Agarwal
Abstract summary: Adaptive Delaytron achieves a regret bound of $mathcalOleft(sqrtfrac2 Kgammaleft[fracT2+left(2+fracL2R2Vert WVert_F2right)sum_t=1Td_tright. We show that Adaptive Delaytron achieves a regret bound of $mathcalOleft(sqrtfrac2 Kgammaleft[fracT2
Score: 6.624726878647541
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we present online algorithm called {\it Delaytron} for learning multi class classifiers using delayed bandit feedbacks. The sequence of feedback delays $\{d_t\}_{t=1}^T$ is unknown to the algorithm. At the $t$-th round, the algorithm observes an example $\mathbf{x}_t$ and predicts a label $\tilde{y}_t$ and receives the bandit feedback $\mathbb{I}[\tilde{y}_t=y_t]$ only $d_t$ rounds later. When $t+d_t>T$, we consider that the feedback for the $t$-th round is missing. We show that the proposed algorithm achieves regret of $\mathcal{O}\left(\sqrt{\frac{2 K}{\gamma}\left[\frac{T}{2}+\left(2+\frac{L^2}{R^2\Vert \W\Vert_F^2}\right)\sum_{t=1}^Td_t\right]}\right)$ when the loss for each missing sample is upper bounded by $L$. In the case when the loss for missing samples is not upper bounded, the regret achieved by Delaytron is $\mathcal{O}\left(\sqrt{\frac{2 K}{\gamma}\left[\frac{T}{2}+2\sum_{t=1}^Td_t+\vert \mathcal{M}\vert T\right]}\right)$ where $\mathcal{M}$ is the set of missing samples in $T$ rounds. These bounds were achieved with a constant step size which requires the knowledge of $T$ and $\sum_{t=1}^Td_t$. For the case when $T$ and $\sum_{t=1}^Td_t$ are unknown, we use a doubling trick for online learning and proposed Adaptive Delaytron. We show that Adaptive Delaytron achieves a regret bound of $\mathcal{O}\left(\sqrt{T+\sum_{t=1}^Td_t}\right)$. We show the effectiveness of our approach by experimenting on various datasets and comparing with state-of-the-art approaches.

Related papers

Sample-Efficient Linear Regression with Self-Selection Bias [7.605563562103568]
We consider the problem of linear regression with self-selection bias in the unknown-index setting. We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $mathbfw_1,ldots,mathbfw_kin. Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression.
arXiv Detail & Related papers (2024-02-22T02:20:24Z)
Matrix Completion in Almost-Verification Time [37.61139884826181]
We provide an algorithm which completes $mathbfM$ on $99%$ of rows and columns. We show how to boost this partial completion guarantee to a full matrix completion algorithm.
arXiv Detail & Related papers (2023-08-07T15:24:49Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals. Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z)
Linear Bandits on Uniformly Convex Sets [88.3673525964507]
Linear bandit algorithms yield $tildemathcalO(nsqrtT)$ pseudo-regret bounds on compact convex action sets. Two types of structural assumptions lead to better pseudo-regret bounds.
arXiv Detail & Related papers (2021-03-10T07:33:03Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
The Average-Case Time Complexity of Certifying the Restricted Isometry Property [66.65353643599899]
In compressed sensing, the restricted isometry property (RIP) on $M times N$ sensing matrices guarantees efficient reconstruction of sparse vectors. We investigate the exact average-case time complexity of certifying the RIP property for $Mtimes N$ matrices with i.i.d. $mathcalN(0,1/M)$ entries.
arXiv Detail & Related papers (2020-05-22T16:55:01Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors. We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
Adaptive Online Learning with Varying Norms [45.11667443216861]
We provide an online convex optimization algorithm that outputs points $w_t$ in some domain $W$. We apply this result to obtain new "full-matrix"-style regret bounds.
arXiv Detail & Related papers (2020-02-10T17:22:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.