Fugu-MT 論文翻訳(概要): Lipschitz Bandits with Stochastic Delayed Feedback

論文の概要: Lipschitz Bandits with Stochastic Delayed Feedback

arxiv url: http://arxiv.org/abs/2510.00309v1
Date: Tue, 30 Sep 2025 22:07:17 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.269768
Title: Lipschitz Bandits with Stochastic Delayed Feedback
Title（参考訳）: 確率遅延フィードバックを持つリプシッツバンド
Authors: Zhongxuan Liu, Yue Kang, Thomas C. M. Lee,
Abstract要約: 本稿では,遅延フィードバックの存在下でのリプシッツ・バンディットの新たな問題を紹介する。各設定でサブ線形後悔保証を実現するアルゴリズムを設計する。本稿では,様々な遅延シナリオ下でのアルゴリズムの効率性を示す実験結果を示す。
参考スコア（独自算出の注目度）: 3.0594138391611967
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The Lipschitz bandit problem extends stochastic bandits to a continuous action set defined over a metric space, where the expected reward function satisfies a Lipschitz condition. In this work, we introduce a new problem of Lipschitz bandit in the presence of stochastic delayed feedback, where the rewards are not observed immediately but after a random delay. We consider both bounded and unbounded stochastic delays, and design algorithms that attain sublinear regret guarantees in each setting. For bounded delays, we propose a delay-aware zooming algorithm that retains the optimal performance of the delay-free setting up to an additional term that scales with the maximal delay $\tau_{\max}$. For unbounded delays, we propose a novel phased learning strategy that accumulates reliable feedback over carefully scheduled intervals, and establish a regret lower bound showing that our method is nearly optimal up to logarithmic factors. Finally, we present experimental results to demonstrate the efficiency of our algorithms under various delay scenarios.
Abstract（参考訳）: リプシッツのバンディット問題は、確率的バンディットを計量空間上で定義された連続作用集合に拡張し、期待される報酬関数がリプシッツ条件を満たす。本研究では,確率的遅延フィードバックが存在する場合に,リプシッツ・バンディットの新たな問題を導入する。我々は,有界および非有界な確率的遅延と,各設定におけるサブ線形後悔保証を実現する設計アルゴリズムについて検討する。有界遅延に対して、最大遅延$\tau_{\max}$でスケールする追加項まで遅延フリー設定の最適性能を保持する遅延対応ズームアルゴリズムを提案する。非有界遅延に対して、慎重にスケジュールされた間隔で信頼性の高いフィードバックを蓄積する新しい位相学習戦略を提案し、この手法が対数的要因にほぼ最適であることを示す後悔の低い境界を確立する。最後に,様々な遅延シナリオ下でのアルゴリズムの効率性を示す実験結果を示す。

論文の概要: Lipschitz Bandits with Stochastic Delayed Feedback

関連論文リスト