Fugu-MT 論文翻訳(概要): Offline Local Search for Online Stochastic Bandits

論文の概要: Offline Local Search for Online Stochastic Bandits

arxiv url: http://arxiv.org/abs/2604.09423v1
Date: Fri, 10 Apr 2026 15:36:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-13 17:57:53.938055
Title: Offline Local Search for Online Stochastic Bandits
Title（参考訳）: オンライン確率帯域のオフラインローカル検索
Authors: Gerdus Benadè, Rathish Das, Thomas Lavastida,
Abstract要約: コンビニアル・マルチアーマード・バンディットは基本的なオンライン意思決定環境を提供する。目的は、後ろ向きの最適な固定アクションに比べて損失として定義される後悔である。オフラインの欲求と線形最適化アルゴリズム(正確にも近似的にも)は、オンラインにデプロイする際に有用な保証を提供することが示されている。
参考スコア（独自算出の注目度）: 3.029373177207996
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Combinatorial multi-armed bandits provide a fundamental online decision-making environment where a decision-maker interacts with an environment across $T$ time steps, each time selecting an action and learning the cost of that action. The goal is to minimize regret, defined as the loss compared to the optimal fixed action in hindsight under full-information. There has been substantial interest in leveraging what is known about offline algorithm design in this online setting. Offline greedy and linear optimization algorithms (both exact and approximate) have been shown to provide useful guarantees when deployed online. We investigate local search methods, a broad class of algorithms used widely in both theory and practice, which have thus far been under-explored in this context. We focus on problems where offline local search terminates in an approximately optimal solution and give a generic method for converting such an offline algorithm into an online stochastic combinatorial bandit algorithm with $O(\log^3 T)$ (approximate) regret. In contrast, existing offline-to-online frameworks yield regret (and approximate regret) which depend sub-linearly, but polynomially on $T$. We demonstrate the flexibility of our framework by applying it to three online stochastic combinatorial optimization problems: scheduling to minimize total completion time, finding a minimum cost base of a matroid and uncertain clustering.
Abstract（参考訳）: Combinatorのマルチアームバンドは、意思決定者がアクションを選択し、そのアクションのコストを学習するたびに、T$タイムステップの環境と対話する、基本的なオンライン意思決定環境を提供する。目的は、完全な情報の下での後ろ向きの最適な固定行動に比べて損失として定義される後悔を最小限に抑えることである。このオンライン環境では、オフラインのアルゴリズム設計で知られていることを活用することに、かなりの関心が寄せられている。オフラインの欲求と線形最適化アルゴリズム(正確にも近似的にも)は、オンラインにデプロイする際に有用な保証を提供することが示されている。提案手法は,理論と実践の両方で広く用いられているアルゴリズムの幅広いクラスである局所探索法について検討する。我々は,オフライン局所探索がほぼ最適解で終了する問題に着目し,そのようなオフラインアルゴリズムを$O(\log^3 T)$ (approximate) の後悔を伴うオンライン確率的組合せ帯域アルゴリズムに変換する一般的な方法を提案する。対照的に、既存のオフライン-オフラインのフレームワークは、サブリニアに依存するが、多項式的に$T$に依存する後悔(およびほぼ後悔)をもたらす。本稿では,3つのオンライン確率的組合せ最適化問題 – 全体の完了時間を最小化するためのスケジューリング,マトロイドの最小コストベース探索,不確実クラスタリング – に適用することで,フレームワークの柔軟性を実証する。

論文の概要: Offline Local Search for Online Stochastic Bandits

関連論文リスト