Fugu-MT 論文翻訳(概要): Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

論文の概要: Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

arxiv url: http://arxiv.org/abs/2306.02208v1
Date: Sat, 3 Jun 2023 22:41:44 GMT
ステータス: 翻訳完了
システム内更新日: 2023-06-06 19:26:42.212470
Title: Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits
Title（参考訳）: シングルパスストリーミングマルチアームバンドのためのタイトレグレト境界
Authors: Chen Wang
Abstract要約: K$アームと$T$トライアルを持つシングルパス設定では、$o(K)$メモリを持つ任意のアルゴリズムに対して、後悔の少ない$Omega(T2/3)$が証明されている。本稿では,o(K)$メモリを持つアルゴリズムに対して,Omega(K/3log/3(T))$に制限された後悔の低減を図る。提案アルゴリズムはベンチマーク均一探索アルゴリズムを大きなマージンで一貫して上回り、時には後悔を最大70%削減することを示した。
参考スコア（独自算出の注目度）: 3.5955736977697073
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Regret minimization in streaming multi-armed bandits (MABs) has been studied extensively in recent years. In the single-pass setting with $K$ arms and $T$ trials, a regret lower bound of $\Omega(T^{2/3})$ has been proved for any algorithm with $o(K)$ memory (Maiti et al. [NeurIPS'21]; Agarwal at al. [COLT'22]). On the other hand, however, the previous best regret upper bound is still $O(K^{1/3} T^{2/3}\log^{1/3}(T))$, which is achieved by the streaming implementation of the simple uniform exploration. The $O(K^{1/3}\log^{1/3}(T))$ gap leaves the open question of the tight regret bound in the single-pass MABs with sublinear arm memory. In this paper, we answer this open problem and complete the picture of regret minimization in single-pass streaming MABs. We first improve the regret lower bound to $\Omega(K^{1/3}T^{2/3})$ for algorithms with $o(K)$ memory, which matches the uniform exploration regret up to a logarithm factor in $T$. We then show that the $\log^{1/3}(T)$ factor is not necessary, and we can achieve $O(K^{1/3}T^{2/3})$ regret by finding an $\varepsilon$-best arm and committing to it in the rest of the trials. For regret minimization with high constant probability, we can apply the single-memory $\varepsilon$-best arm algorithms in Jin et al. [ICML'21] to obtain the optimal bound. Furthermore, for the expected regret minimization, we design an algorithm with a single-arm memory that achieves $O(K^{1/3} T^{2/3}\log(K))$ regret, and an algorithm with $O(\log^{*}(n))$-memory with the optimal $O(K^{1/3} T^{2/3})$ regret following the $\varepsilon$-best arm algorithm in Assadi and Wang [STOC'20]. We further tested the empirical performances of our algorithms. The simulation results show that the proposed algorithms consistently outperform the benchmark uniform exploration algorithm by a large margin, and on occasion, reduce the regret by up to 70%.
Abstract（参考訳）: ストリーミングマルチアームバンディット(MAB)におけるレグレト最小化は近年広く研究されている。 K$アームと$T$トライアルを持つシングルパス設定では、$o(K)$メモリを持つ任意のアルゴリズムに対して、後悔の低い$\Omega(T^{2/3})$が証明されている(Maiti et al. [NeurIPS'21]; Agarwal at al. [COLT'22])。しかし、それ以前の最も後悔すべき上限は、単純な一様探索のストリーミング実装によって達成された$O(K^{1/3} T^{2/3}\log^{1/3}(T))$である。 O(K^{1/3}\log^{1/3}(T))$ギャップは、サブリニアアームメモリを持つシングルパスMABにおける厳密な後悔境界のオープンな問題を残している。本稿では、このオープンな問題に答え、シングルパスストリーミングMABにおける後悔の最小化像を完成させる。まず、メモリがo(k)$のアルゴリズムに対して$\omega(k^{1/3}t^{2/3})$という、一様探索の後悔を$t$の対数係数に一致させる、後悔の低さを改善する。すると、$\log^{1/3}(T)$ 因子は必要ないことを示し、$O(K^{1/3}T^{2/3})$ 後悔は、$\varepsilon$-best アームを見つけ、残りの試行でそれをコミットすることで達成できる。高確率の後悔最小化のために、単一のメモリ$\varepsilon$-best アームアルゴリズムを Jin などに適用できる。 [ICML'21] 最適境界を得る。さらに、期待される最小化のために、単腕メモリで$O(K^{1/3} T^{2/3}\log(K))$ regretと、最適な$O(K^{1/3} T^{2/3})$-Memoryを持つ$O(\log^{*}(n))$-Memoryのアルゴリズムを設計し、AssadiとWangの$\varepsilon$-bestarmアルゴリズムに続き、$O(K^{1/3} T^{2/3})$-Memoryを設計する。さらに,アルゴリズムの実証性能を検証した。シミュレーションの結果,提案アルゴリズムは,ベンチマーク一様探索アルゴリズムを高いマージンで常に上回り,時には最大70%の後悔を減少させることがわかった。

関連論文リスト

p-Mean Regret for Stochastic Bandits [52.828710025519996]
単純で統一された UCB ベースのアルゴリズムを導入し、新しい$p$-mean の後悔境界を実現する。我々の枠組みは、特別な場合として、平均的な累積的後悔とナッシュ後悔の両方を包含する。
論文参考訳（メタデータ） (2024-12-14T08:38:26Z)
Improved Regret for Bandit Convex Optimization with Delayed Feedback [50.46856739179311]
遅延フィードバックを伴うバンド凸最適化(BCO)。我々は,新しいアルゴリズムを開発し,一般にO(sqrtnT3/4+sqrtdT)$の後悔境界を満足していることを証明する。提案アルゴリズムは,強い凸関数に対して$O((nT)2/3log/3T+dlog T)$に制限された後悔を改善することができることを示す。
論文参考訳（メタデータ） (2024-02-14T13:08:26Z)
Minimax Optimal Submodular Optimization with Bandit Feedback [13.805872311596739]
単調な部分モジュラー集合関数 $f: 2[n] rightarrow [0,1]$ をバンドイットフィードバックの下で最大化する。具体的には、$f$は学習者には知られていないが、各時点で$t=1,dots,T$は、$|S_t| leq k$でセットの$S_tサブセット[n]$を選択し、$eta_t$が平均ゼロのサブガウスノイズである場合に、$f(S_t) + eta_t$を受け取る。
論文参考訳（メタデータ） (2023-10-27T20:19:03Z)
The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits [10.329863009504303]
我々は、$O(fracnDelta2)$の最適なサンプル複雑さを利用するサブ線形メモリを持つストリーミングアルゴリズムは、$Omega(fraclog (1/Delta)log (1/Delta)$ passを必要とすることを示した。この結果は,[ICML'21] の$O(log(frac1Delta))$-passアルゴリズムと一致し, [ICML'21] は$O(1)$メモリしか使用せず, Assadi と Wang が提案したオープンな質問に答える。
論文参考訳（メタデータ） (2023-09-06T16:41:41Z)
Tight Memory-Regret Lower Bounds for Streaming Bandits [11.537938617281736]
学習者はオンラインの腕とサブリニアの腕の記憶を扱うことで後悔を最小化することを目的としています我々は,任意のアルゴリズムに対して,Omega left((TB)alpha K1-alpharight), α = 2B / (2B+1-1)$という,厳密な最悪ケースの残差を低く設定する。また、一定のアームメモリを用いて、左(TB)アルファK1-αright)$の後悔の上限を達成できるマルチパスアルゴリズムも提供する。
論文参考訳（メタデータ） (2023-06-13T16:54:13Z)
Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime [74.52487417350221]
オンライン学習の問題は,ゼロロスソリューションが存在する,実現可能な環境において考慮する。そこで我々は,ほぼ最適の後悔境界を求める新たなDPアルゴリズムを提案する。
論文参考訳（メタデータ） (2023-02-27T21:19:24Z)
Better Best of Both Worlds Bounds for Bandits with Switching Costs [37.71741656687868]
本稿では,2021年にRouryer,Seldin,Cesa-Bianchiらにより,スイッチングコストを伴うバンディットのベスト・オブ・ザ・ワールドス・アルゴリズムについて検討した。本稿では, 極小最小の最小残差を$mathcalO(T2/3)$で同時に達成する, 驚くほど単純かつ効果的に導入する。
論文参考訳（メタデータ） (2022-06-07T08:22:56Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
自然クエリモデルにより,アルゴリズムが$O(log T)$ regretsを$O(sqrtT)$ hintsで得ることを示す。また、$o(sqrtT)$ hintsは$Omega(sqrtT)$ regretより保証できないことも示しています。
論文参考訳（メタデータ） (2021-11-09T16:50:18Z)
Bandits with many optimal arms [68.17472536610859]
最適アームの割合は$p*$、最適アームとサブ最適化アームの間の最小平均ギャップは$Delta$と書きます。我々は,累積的後悔設定と最良腕識別設定の両方において最適な学習率を特徴付ける。
論文参考訳（メタデータ） (2021-03-23T11:02:31Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso banditは、報酬関数を定義するベクトルとスパースサポートを推定するアルゴリズムである。一般には $mathcalO( log d + sqrtT )$ や $mathcalO( log d + sqrtT )$ としてスケールする非漸近的後悔の上界を確立する。
論文参考訳（メタデータ） (2020-10-22T19:14:37Z)
Streaming Complexity of SVMs [110.63976030971106]
本稿では,ストリーミングモデルにおけるバイアス正規化SVM問題を解く際の空間複雑性について検討する。両方の問題に対して、$frac1lambdaepsilon$の次元に対して、$frac1lambdaepsilon$よりも空間的に小さいストリーミングアルゴリズムを得ることができることを示す。
論文参考訳（メタデータ） (2020-07-07T17:10:00Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。