Fugu-MT 論文翻訳(概要): Near-Optimal Stochastic Linear Bandits with Delay

論文の概要: Near-Optimal Stochastic Linear Bandits with Delay

arxiv url: http://arxiv.org/abs/2606.16656v1
Date: Mon, 15 Jun 2026 12:48:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:34.555491
Title: Near-Optimal Stochastic Linear Bandits with Delay
Title（参考訳）: 遅延を伴うほぼ最適確率線形帯域
Authors: Ofir Schlisselberg, Mengxiao Zhang, Yishay Mansour,
Abstract要約: いくつかの遅延モデルの下で遅延フィードバックを伴う線形帯域について検討し、ほぼ最適の後悔保証を確立する。以上の結果から,遅延線形バンディットがマルチアームバンディット(MAB)と同じ定性的行動を示す場合の同定が可能となった。最適アームの遅れにのみ依存する最適MAB保証も線形バンディットでは達成できないことを示す。
参考スコア（独自算出の注目度）: 47.33152077961055
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study stochastic linear bandits with delayed feedback under several delay models and establish near-optimal regret guarantees. Our results identify when delayed linear bandits exhibit the same qualitative behavior as multi-armed bandits (MAB), and when the linear structure creates fundamentally new challenges. Specifically, (1) for \emph{loss-independent delays}, where the delay does not depend on the realized loss (but potentially depends on the arm), we show that delays incur only an additive regret penalty. Under stochastic delays, this penalty scales with the expected delay, while under adversarial delays, it scales with the maximum number of outstanding observations. Notably, both delay penalties are dimension-free, improving upon the state-of-the-art results; (2) for \emph{loss-dependent delays}, we show that linear bandits are substantially harder than MAB: unlike in MAB, we prove matching (up to log factors) upper and lower bounds in linear bandits, whose delay penalty depends on the square root of the dimension. (3) for the \emph{delay-as-payoff model}, a special case of loss-dependent delay, we show that the optimal MAB guarantee, which depends only on the delay of the optimal arm, is also unattainable in linear bandits. Together, these results provide a sharp characterization of how delayed feedback interacts with linear generalization.
Abstract（参考訳）: 複数の遅延モデルの下で, 遅延フィードバックを伴う確率線形包帯について検討し, ほぼ最適後悔保証を確立する。以上の結果から, 遅延線形バンディットがマルチアームバンディット (MAB) と同じ定性的行動を示す場合と, 線形構造が根本的に新しい課題を創出する場合を同定した。具体的には、(1) 遅延が実現された損失に依存しない(しかし、潜在的には腕に依存している) \emph{loss-independent delays} に対して、遅延は加法的後悔の罰のみを引き起こすことを示す。確率的遅延の下では、このペナルティは期待される遅延とともにスケールするが、敵対的な遅延の下では、顕著な観測の最大数でスケールする。特に,2つの遅れのペナルティは非次元的であり,両者とも最先端の結果により改善されている; (2) \emph{loss-dependent delays} の場合,線形包帯は MAB よりもかなり困難であることが示される;MAB とは異なり,線形包帯の上下境界の一致(対数因子まで)を証明し,その遅れのペナルティは次元の平方根に依存する。 (3) 損失依存遅延の特別な場合である 'emph{delay-as-payoff model} に対して、最適アームの遅延にのみ依存する最適MAB保証が線形包帯では達成できないことを示す。これらの結果は、遅延フィードバックが線形一般化とどのように相互作用するかを鋭く評価する。

論文の概要: Near-Optimal Stochastic Linear Bandits with Delay

関連論文リスト