Fugu-MT 論文翻訳(概要): Nearly Tight Bounds for Exploration in Streaming Multi-armed Bandits with Known Optimality Gap

論文の概要: Nearly Tight Bounds for Exploration in Streaming Multi-armed Bandits with Known Optimality Gap

arxiv url: http://arxiv.org/abs/2502.01067v1
Date: Mon, 03 Feb 2025 05:24:35 GMT
ステータス: 翻訳完了
システム内更新日: 2025-02-05 15:21:54.392043
Title: Nearly Tight Bounds for Exploration in Streaming Multi-armed Bandits with Known Optimality Gap
Title（参考訳）: 最適ギャップを有するマルチアームバンドストリーミングにおける探索のための近距離境界
Authors: Nikolai Karpov, Chen Wang,
Abstract要約: マルチパスストリーミングマルチアームバンディット(MAB)における純粋探索のためのサンプルメモリパストレードオフについて検討する。最初に下界を示し、わずかにサブ線形メモリを持つ最適なアームを見つけるアルゴリズム -- $o(n/textpolylog(n))$ arms -- と $O(sum_i=2n1/Delta2_[i][i]cdot logn)$ arm pulls を示す。すると、私たちはほとんど姿を現す。
参考スコア（独自算出の注目度）: 9.095990028343369
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the sample-memory-pass trade-offs for pure exploration in multi-pass streaming multi-armed bandits (MABs) with the *a priori* knowledge of the optimality gap $\Delta_{[2]}$. Here, and throughout, the optimality gap $\Delta_{[i]}$ is defined as the mean reward gap between the best and the $i$-th best arms. A recent line of results by Jin, Huang, Tang, and Xiao [ICML'21] and Assadi and Wang [COLT'24] have shown that if there is no known $\Delta_{[2]}$, a pass complexity of $\Theta(\log(1/\Delta_{[2]}))$ (up to $\log\log(1/\Delta_{[2]})$ terms) is necessary and sufficient to obtain the *worst-case optimal* sample complexity of $O(n/\Delta^{2}_{[2]})$ with a single-arm memory. However, our understanding of multi-pass algorithms with known $\Delta_{[2]}$ is still limited. Here, the key open problem is how many passes are required to achieve the complexity, i.e., $O( \sum_{i=2}^{n}1/\Delta^2_{[i]})$ arm pulls, with a sublinear memory size. In this work, we show that the ``right answer'' for the question is $\Theta(\log{n})$ passes (up to $\log\log{n}$ terms). We first present a lower bound, showing that any algorithm that finds the best arm with slightly sublinear memory -- a memory of $o({n}/{\text{polylog}({n})})$ arms -- and $O(\sum_{i=2}^{n}{1}/{\Delta^{2}_{[i]}}\cdot \log{(n)})$ arm pulls has to make $\Omega(\frac{\log{n}}{\log\log{n}})$ passes over the stream. We then show a nearly-matching algorithm that assuming the knowledge of $\Delta_{[2]}$, finds the best arm with $O( \sum_{i=2}^{n}1/\Delta^2_{[i]} \cdot \log{n})$ arm pulls and a *single arm* memory.
Abstract（参考訳）: マルチパスストリーミングマルチアームバンディット(MAB)におけるサンプルメモリパスのトレードオフについて,最適性ギャップ$\Delta_{[2]}$の*a prei*知識を用いて検討する。ここでは、最適性ギャップ$\Delta_{[i]}$は、ベストと第2のベストアームの間の平均報酬ギャップとして定義される。 Jin, Huang, Tang, and Xiao [ICML'21] と Assadi and Wang [COLT'24] による最近の結果は、既知の$\Delta_{[2]}$が存在しない場合、$\Theta(\log(1/\Delta_{[2]})$($\log\log(1/\Delta_{[2]})$)$($\log\log(1/\Delta_{[2]})$)のパス複雑性は必要であり、$O(n/\Delta^{2}_{[2]})$のサンプル複雑さを得るのに十分であることを示している。しかし、既知の$\Delta_{[2]}$によるマルチパスアルゴリズムの理解はまだ限られている。ここで鍵となるオープンな問題は、複雑さを達成するために何回のパスが必要か、すなわち$O( \sum_{i=2}^{n}1/\Delta^2_{[i]})$arm pullsで、サブ線形メモリサイズである。この研究で、質問に対する ``right answer'' は $\Theta(\log{n})$ pass ($\log\log{n}$ terms)であることを示す。まず下界を示し、$o({n}/{\text{polylog}({n})})$ arms -- and $O(\sum_{i=2}^{n}{1}/{\Delta^{2}_{[i]}}\cdot \log{(n)})$ arm pullsは$\Omega(\frac {\log{n}}{\log\log{n}})$ストリームを渡さなければならない。すると、$\Delta_{[2]}$の知識を仮定すると、$O( \sum_{i=2}^{n}1/\Delta^2_{[i]} \cdot \log{n})$ arm pulls と *single arm* memory で最適なアームを見つける。

関連論文リスト

Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits [3.096236408131107]
Omega_alphaBig(frac(k-m+1) Tfrac1alphak1 + frac1alpha+1 sum_i:Delta_i > 0Delta_i1 - 2alphaBig)$である。これは、ストリーミングMABに対する最初の厳密なギャップ依存の後悔境界である。
論文参考訳（メタデータ） (2025-03-04T09:18:35Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
この問題は通信複雑性のランダム化を$Omega(frac1kcdot n2log|mathbbF|)$とする。アプリケーションとして、$k$パスを持つ任意のストリーミングアルゴリズムに対して、$Omega(frac1kcdot n2log|mathbbF|)$スペースローバウンドを得る。
論文参考訳（メタデータ） (2024-10-26T06:21:42Z)
Optimal Streaming Algorithms for Multi-Armed Bandits [28.579280943038555]
ストリーミングBAI問題では,最大報酬が1-delta$の確率でアームを識別することが目的である。我々は,O(log Delta-1)$パス内で,ほぼインスタンス依存の最適なサンプル複雑性を実現するシングルアームメモリアルゴリズムを提案する。
論文参考訳（メタデータ） (2024-10-23T12:54:04Z)
The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits [10.329863009504303]
我々は、$O(fracnDelta2)$の最適なサンプル複雑さを利用するサブ線形メモリを持つストリーミングアルゴリズムは、$Omega(fraclog (1/Delta)log (1/Delta)$ passを必要とすることを示した。この結果は,[ICML'21] の$O(log(frac1Delta))$-passアルゴリズムと一致し, [ICML'21] は$O(1)$メモリしか使用せず, Assadi と Wang が提案したオープンな質問に答える。
論文参考訳（メタデータ） (2023-09-06T16:41:41Z)
Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
我々は、未知の分布から生じる無限に多くのバンドイットアームを用いて純粋探索を研究する。私たちのゴールは、平均的な報酬が1-delta$の1つの高品質なアームを、最高の$eta$-fraction of armsの1つとして$varepsilon$内で効率的に選択することにあります。
論文参考訳（メタデータ） (2023-06-03T04:00:47Z)
Variance-Dependent Best Arm Identification [12.058652922228918]
マルチアームバンディットゲームにおける最適な腕を特定する問題について検討する。武器の報酬のギャップと分散を探索する適応アルゴリズムを提案する。このアルゴリズムは2つの対数項に最適であることを示す。
論文参考訳（メタデータ） (2021-06-19T04:13:54Z)
Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
シャッフルモデルにおけるマルチアームバンディット(MAB)問題に対して,$(varepsilon,delta)$-differentially privateアルゴリズムを提案する。我々の上限は、集中モデルにおいて最もよく知られたアルゴリズムの後悔とほぼ一致し、局所モデルにおいて最もよく知られたアルゴリズムを著しく上回っている。
論文参考訳（メタデータ） (2021-06-05T14:11:01Z)
Bandits with many optimal arms [68.17472536610859]
最適アームの割合は$p*$、最適アームとサブ最適化アームの間の最小平均ギャップは$Delta$と書きます。我々は,累積的後悔設定と最良腕識別設定の両方において最適な学習率を特徴付ける。
論文参考訳（メタデータ） (2021-03-23T11:02:31Z)
An Optimal Separation of Randomized and Quantum Query Complexity [67.19751155411075]
すべての決定木に対して、与えられた順序 $ellsqrtbinomdell (1+log n)ell-1,$ sum to at least $cellsqrtbinomdell (1+log n)ell-1,$ where $n$ is the number of variables, $d$ is the tree depth, $c>0$ is a absolute constant。
論文参考訳（メタデータ） (2020-08-24T06:50:57Z)
Streaming Complexity of SVMs [110.63976030971106]
本稿では,ストリーミングモデルにおけるバイアス正規化SVM問題を解く際の空間複雑性について検討する。両方の問題に対して、$frac1lambdaepsilon$の次元に対して、$frac1lambdaepsilon$よりも空間的に小さいストリーミングアルゴリズムを得ることができることを示す。
論文参考訳（メタデータ） (2020-07-07T17:10:00Z)
On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions [62.01594253618911]
有限和の有限ノイズ構造を利用して、大域オラクルモデルの下での一致する$O(n2)$-upper境界を導出する。同様のアプローチを踏襲したSVRGの新規な適応法を提案し、これはオラクルと互換性があり、$tildeO(n2+nsqrtL/mu)log (1/epsilon)$と$O(nsqrtL/epsilon)$, for $mu>0$と$mu=0$の複雑さ境界を実現する。
論文参考訳（メタデータ） (2020-02-09T03:39:46Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。