Fugu-MT 論文翻訳(概要): Bandit-Feedback Online Multiclass Classification: Variants and Tradeoffs

論文の概要: Bandit-Feedback Online Multiclass Classification: Variants and Tradeoffs

arxiv url: http://arxiv.org/abs/2402.07453v1
Date: Mon, 12 Feb 2024 07:20:05 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-13 15:22:07.753654
Title: Bandit-Feedback Online Multiclass Classification: Variants and Tradeoffs
Title（参考訳）: Bandit-Feedbackオンラインマルチクラス分類:変数とトレードオフ
Authors: Yuval Filmus, Steve Hanneke, Idan Mehalel and Shay Moran
Abstract要約: 我々は,帯域幅フィードバックの下での最適誤りが,全情報ケースの最適誤りよりも少なくとも$O(k)$倍高いことを示す。また、ランダム化学習者と決定論的学習者の間のギャップに対して、$tildeTheta(k)$のほぼ最適な境界を示す。
参考スコア（独自算出の注目度）: 32.29254118429081
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Consider the domain of multiclass classification within the adversarial online setting. What is the price of relying on bandit feedback as opposed to full information? To what extent can an adaptive adversary amplify the loss compared to an oblivious one? To what extent can a randomized learner reduce the loss compared to a deterministic one? We study these questions in the mistake bound model and provide nearly tight answers. We demonstrate that the optimal mistake bound under bandit feedback is at most $O(k)$ times higher than the optimal mistake bound in the full information case, where $k$ represents the number of labels. This bound is tight and provides an answer to an open question previously posed and studied by Daniely and Helbertal ['13] and by Long ['17, '20], who focused on deterministic learners. Moreover, we present nearly optimal bounds of $\tilde{\Theta}(k)$ on the gap between randomized and deterministic learners, as well as between adaptive and oblivious adversaries in the bandit feedback setting. This stands in contrast to the full information scenario, where adaptive and oblivious adversaries are equivalent, and the gap in mistake bounds between randomized and deterministic learners is a constant multiplicative factor of $2$. In addition, our results imply that in some cases the optimal randomized mistake bound is approximately the square-root of its deterministic parallel. Previous results show that this is essentially the smallest it can get.
Abstract（参考訳）: 敵のオンライン設定におけるマルチクラス分類の領域を考えてみよう。完全な情報とは対照的に、盗賊のフィードバックに頼る値段はいくらですか。適応的敵は、難解な相手に比べて損失をどの程度増幅できるのか? ランダム化学習者は決定論的学習と比較して損失をどの程度軽減できるのか? 我々はこれらの質問を誤り境界モデルで研究し、ほぼ緊密な回答を提供する。我々は、バンディットフィードバックの下で束縛された最適誤りが、ラベル数を表す$k$のフル情報ケースで束縛された最適ミスよりも最大で$o(k)$であることを示す。この境界は厳密で、Daniely氏とHelbertal氏['13]とLong氏['17, '20]によって以前に提起され研究されたオープンな質問に対する回答を提供する。さらに, ランダム化学習者と決定論的学習者とのギャップと, バンドイットフィードバック設定における適応的および必然的な敵間のギャップについて, $\tilde{\theta}(k)$ のほぼ最適境界を提示する。これは、適応的かつ必然的な敵が等価である完全な情報シナリオと対照的であり、ランダム化学習者と決定論的学習者の間の誤り境界の差は、一定の乗算係数が2ドルである。さらに, この結果から, 最適ランダム化誤差境界が決定論的並列の約2乗根であることを示唆する。これまでの結果は、これが本質的には最小のものであることを示している。

関連論文リスト

Agnostic Learning under Targeted Poisoning: Optimal Rates and the Role of Randomness [13.802167452101909]
以前の研究は、このようなインスタンス標的の毒殺攻撃による最適エラーが$Theta(deta)$とスケールすることを確立した。最適余剰誤差が $tildeTheta(sqrtdeta)$ であることを示し、Hannekeらによって残された主要な開問題の一つに答える。
論文参考訳（メタデータ） (2025-06-03T16:53:20Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
人からのフィードバックから学ぶことは、大言語モデル(LLM)のような生成モデルを調整する上で重要な役割を果たす本稿では,本問題の領域内モデルについて考察する。-文脈的デュエルバンディットと敵対的フィードバックを併用し,真の嗜好ラベルを敵によって反転させることができる。本稿では,不確実性重み付き最大推定に基づく頑健なコンテキストデュエルバンドイット(アルゴ)を提案する。
論文参考訳（メタデータ） (2024-04-16T17:59:55Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
本稿では, 一般化線形モデル(GLM)から, デュエルアームのバイナリ比較を生成するコンテキストデュエルバンド問題について検討する。本稿では,SupLinUCB型アルゴリズムを提案する。このアルゴリズムは,計算効率と分散を意識したリセットバウンド$tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$を提案する。我々の後悔は、比較が決定論的である場合の直感的な期待と自然に一致し、アルゴリズムは$tilde O(d)$ regretにのみ悩まされる。
論文参考訳（メタデータ） (2023-10-02T08:15:52Z)
Optimal Prediction Using Expert Advice and Randomized Littlestone Dimension [32.29254118429081]
古典的な結果は、リトルストーン次元を用いた決定論的学習者によって達成可能な最適誤りを特徴づける。クラス $mathcalH$ の学習における最適予測誤差は、そのランダム化されたリトルストーン次元と等しいことを示す。
論文参考訳（メタデータ） (2023-02-27T14:50:34Z)
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency [90.40062452292091]
本稿では,不整合雑音を持つ線形帯域に対する計算効率のよい最初のアルゴリズムを提案する。我々のアルゴリズムは未知のノイズの分散に適応し、$tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regretを達成する。また、強化学習において、線形混合マルコフ決定過程(MDP)に対する分散適応アルゴリズムを提案する。
論文参考訳（メタデータ） (2023-02-21T00:17:24Z)
Swap Agnostic Learning, or Characterizing Omniprediction via Multicalibration [6.91367883100748]
我々は,Swap Agnostic Learningを導入し,研究する。敵の強みにも拘わらず,あらゆる凸損失に対するスワップ・アグノスティック・ラーニングの実現可能性を示す。
論文参考訳（メタデータ） (2023-02-13T22:25:53Z)
The price of unfairness in linear bandits with biased feedback [62.25313751895011]
線形帯域フィードバックによる逐次意思決定の問題点について検討する。その結果,不偏フィードバック下で得られたdT 1/2 log(T) の後悔率よりも最悪の後悔率が高いことがわかった。興味深いことに、ギャップ依存率によって、問題はバイアスのないものほど難しくない非自明なインスタンスの存在が明らかになる。
論文参考訳（メタデータ） (2022-03-18T08:03:20Z)
Online Selective Classification with Limited Feedback [82.68009460301585]
オンライン学習モデルにおいて、予測者がインスタンスの分類を控える可能性のある選択的分類について検討する。私たちが考慮している設定の健全な2つの側面は、データが不可避である可能性があるため、データは不可避である可能性があるということです。 smash$tildeO(T1-mu)$ over abstention against Adaptive adversaries. smash$tildeO(T1-mu)$ incurring smash$tildeO(T1-mu)$ over abstention。
論文参考訳（メタデータ） (2021-10-27T08:00:53Z)
Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
本稿では,敵対的腐敗の存在下での線形文脈的包帯問題について検討する。逆汚染レベルに適応する分散認識アルゴリズムをC$で提案する。
論文参考訳（メタデータ） (2021-10-25T02:53:24Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。