Fugu-MT 論文翻訳(概要): Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks

論文の概要: Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks

arxiv url: http://arxiv.org/abs/2205.08234v1
Date: Tue, 17 May 2022 11:12:20 GMT
ステータス: 翻訳完了
システム内更新日: 2022-05-18 20:16:59.497664
Title: Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks
Title（参考訳）: Delaytron: 遅延帯域フィードバックを持つマルチクラス分類器の効率的な学習
Authors: Naresh Manwani, Mudit Agarwal
Abstract要約: Adaptive Delaytronは、$mathcalOleft(sqrtfrac2 Kgammaleft[fracT2+left(2+fracL2R2Vert WVert_F2right)sum_t=1Td_tright)の後悔の限界を達成する。我々は、Adaptive Delaytronが$mathcalOleft(sqrtfrac2 Kgammaleft[fracT2]の後悔の限界を達成することを示す。
参考スコア（独自算出の注目度）: 6.624726878647541
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we present online algorithm called {\it Delaytron} for learning multi class classifiers using delayed bandit feedbacks. The sequence of feedback delays $\{d_t\}_{t=1}^T$ is unknown to the algorithm. At the $t$-th round, the algorithm observes an example $\mathbf{x}_t$ and predicts a label $\tilde{y}_t$ and receives the bandit feedback $\mathbb{I}[\tilde{y}_t=y_t]$ only $d_t$ rounds later. When $t+d_t>T$, we consider that the feedback for the $t$-th round is missing. We show that the proposed algorithm achieves regret of $\mathcal{O}\left(\sqrt{\frac{2 K}{\gamma}\left[\frac{T}{2}+\left(2+\frac{L^2}{R^2\Vert \W\Vert_F^2}\right)\sum_{t=1}^Td_t\right]}\right)$ when the loss for each missing sample is upper bounded by $L$. In the case when the loss for missing samples is not upper bounded, the regret achieved by Delaytron is $\mathcal{O}\left(\sqrt{\frac{2 K}{\gamma}\left[\frac{T}{2}+2\sum_{t=1}^Td_t+\vert \mathcal{M}\vert T\right]}\right)$ where $\mathcal{M}$ is the set of missing samples in $T$ rounds. These bounds were achieved with a constant step size which requires the knowledge of $T$ and $\sum_{t=1}^Td_t$. For the case when $T$ and $\sum_{t=1}^Td_t$ are unknown, we use a doubling trick for online learning and proposed Adaptive Delaytron. We show that Adaptive Delaytron achieves a regret bound of $\mathcal{O}\left(\sqrt{T+\sum_{t=1}^Td_t}\right)$. We show the effectiveness of our approach by experimenting on various datasets and comparing with state-of-the-art approaches.
Abstract（参考訳）: 本稿では,遅延バンディットフィードバックを用いたマルチクラス分類学習のためのオンラインアルゴリズム「it delaytron」を提案する。フィードバック遅延の列 $\{d_t\}_{t=1}^t$ はアルゴリズムに未知である。このアルゴリズムは、$t$-th ラウンドで、例 $\mathbf{x}_t$ を観察し、ラベル $\tilde{y}_t$ を予測し、後でバンドイットフィードバック $\mathbb{I}[\tilde{y}_t=y_t]$ のみ$d_t$ ラウンドを受信する。 $t+d_t>T$の場合、$t$-thラウンドのフィードバックが欠落していると考えています。提案アルゴリズムは,各欠落サンプルの損失が$L$の上限値である場合に,$\mathcal{O}\left(\sqrt {\frac{2K}{\gamma}\left[\frac{T}{2}+\left(2+\frac{L^2}{R^2\Vert \W\Vert_F^2}\right)\sum_{t=1}^Td_t\right]}\right)を後悔することを示す。欠失サンプルの損失が上限値になっていない場合、delaytronが達成した後悔は$\mathcal{o}\left(\sqrt{\frac{2 k}{\gamma}\left[\frac{t}{2}+2\sum_{t=1}^td_t+\vert \mathcal{m}\vert t\right]}\right)$である。これらの境界は一定のステップサイズで達成され、これは$T$と$\sum_{t=1}^Td_t$の知識を必要とする。 T$と$\sum_{t=1}^Td_t$が未知の場合、オンライン学習に2倍のトリックを使用し、Adaptive Delaytronを提案する。 Adaptive Delaytron は $\mathcal{O}\left(\sqrt{T+\sum_{t=1}^Td_t}\right)$ の残差を持つことを示す。各種データセットを実験し,最先端のアプローチと比較することにより,提案手法の有効性を示す。

論文の概要: Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks

関連論文リスト