Fugu-MT 論文翻訳(概要): Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

論文の概要: Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

arxiv url: http://arxiv.org/abs/2605.01752v1
Date: Sun, 03 May 2026 07:19:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:49.923347
Title: Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions
Title（参考訳）: 未知の遅延と逆転破壊下における後続コンテキストを考慮したロバスト線形ダリングバンド
Authors: Youngmin Oh,
Abstract要約: 不安定な環境下での線形デュエルバンディットについて検討した。本稿では,事前予約情報から事前予約後のコンテキストを予測する学習近似器を統合する用語を提案する。本分析では, 従来の作業に典型的な乗算劣化を回避し, 汚損と遅延の間に付加的なコスト構造を明らかにした。
参考スコア（独自算出の注目度）: 13.10320454140084
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial delays and a cumulative corruption budget $\mathcal{C}$. To address these challenges, we propose \term, which integrates a learned approximator that predicts post-serving contexts from pre-serving information. It further employs an adaptive weighting strategy that clips feature vectors to mitigate the impact of corrupted and delayed observations simultaneously. Under standard regularity conditions and a parametric post-serving mapping, we rigorously establish that our algorithm is delay-regime-agnostic, achieving a regret upper bound of $\widetilde{\mathcal{O}}(d(\sqrt{T} + \mathcal{C} + \mathcal{D}))$, where $d$ is the total feature dimension and $\mathcal{D}$ encapsulates the delay complexity. Crucially, our analysis reveals an additive cost structure between corruption and delay, avoiding the multiplicative degradation typical of prior works. We further establish lower bounds that nearly match our upper bounds up to a $\sqrt{d}$ factor for adversarial delays in the absence of post-serving contexts.
Abstract（参考訳）: 不安定な環境下での線形デュエルバンディットについて検討した。フィードバックには、未知の確率的あるいは敵対的な遅延と累積的な汚職予算$\mathcal{C}$が課せられる。これらの課題に対処するために,学習した近似器を統合した \term を提案する。さらに適応的な重み付け戦略を採用し、特徴ベクトルをクリップすることで、劣化した観測と遅延した観測の影響を同時に緩和する。標準的な正規性条件とパラメトリックポストサーブリングマッピングの下で、我々のアルゴリズムは遅延レジームに依存しないことを厳格に証明し、後悔の上限である$\widetilde{\mathcal{O}}(d(\sqrt{T} + \mathcal{C} + \mathcal{D})$を達成し、$d$は全特徴次元であり、$\mathcal{D}$は遅延複雑性をカプセル化する。本分析では, 劣化と遅延の間に付加的なコスト構造を呈し, 従来の作業に典型的な乗算的劣化を回避した。さらに、上界にほぼ一致する下界を$\sqrt{d}$因子に設定し、後続の文脈が存在しない場合の逆遅延を計算します。

論文の概要: Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

関連論文リスト