Fugu-MT 論文翻訳(概要): Understanding Memory-Regret Trade-Off for Streaming Stochastic Multi-Armed Bandits

論文の概要: Understanding Memory-Regret Trade-Off for Streaming Stochastic Multi-Armed Bandits

arxiv url: http://arxiv.org/abs/2405.19752v1
Date: Thu, 30 May 2024 06:56:48 GMT
ステータス: 翻訳完了
システム内更新日: 2024-05-31 15:48:50.657349
Title: Understanding Memory-Regret Trade-Off for Streaming Stochastic Multi-Armed Bandits
Title（参考訳）: 確率的マルチアーム帯域のストリーミングにおけるメモリ-レグレットトレードオフの理解
Authors: Yuchen He, Zichun Ye, Chihao Zhang,
Abstract要約: P$-passストリーミングモデルにおけるマルチアームバンディット問題について検討する。最適後悔を$m, n$および$P$で完全に特徴づける。
参考スコア（独自算出の注目度）: 2.1579533951772163
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the stochastic multi-armed bandit problem in the $P$-pass streaming model. In this problem, the $n$ arms are present in a stream and at most $m<n$ arms and their statistics can be stored in the memory. We give a complete characterization of the optimal regret in terms of $m, n$ and $P$. Specifically, we design an algorithm with $\tilde O\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)$ regret and complement it with an $\tilde \Omega\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)$ lower bound when the number of rounds $T$ is sufficiently large. Our results are tight up to a logarithmic factor in $n$ and $P$.
Abstract（参考訳）: P$-passストリーミングモデルにおける確率的マルチアームバンディット問題について検討する。この問題では、$n$armはストリームに存在し、少なくとも$m<n$armはメモリに格納される。最適後悔を$m, n$および$P$で完全に特徴づける。具体的には、$\tilde O\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)$ regret を用いてアルゴリズムを設計し、$\tilde \Omega\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)$ rounds$Tが十分に大きい場合の下位境界を補う。我々の結果は、対数係数が$n$と$P$に固まる。

関連論文リスト

Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits [3.096236408131107]
Omega_alphaBig(frac(k-m+1) Tfrac1alphak1 + frac1alpha+1 sum_i:Delta_i > 0Delta_i1 - 2alphaBig)$である。これは、ストリーミングMABに対する最初の厳密なギャップ依存の後悔境界である。
論文参考訳（メタデータ） (2025-03-04T09:18:35Z)
Nearly Tight Bounds for Exploration in Streaming Multi-armed Bandits with Known Optimality Gap [9.095990028343369]
マルチパスストリーミングマルチアームバンディット(MAB)における純粋探索のためのサンプルメモリパストレードオフについて検討する。最初に下界を示し、わずかにサブ線形メモリを持つ最適なアームを見つけるアルゴリズム -- $o(n/textpolylog(n))$ arms -- と $O(sum_i=2n1/Delta2_[i][i]cdot logn)$ arm pulls を示す。すると、私たちはほとんど姿を現す。
論文参考訳（メタデータ） (2025-02-03T05:24:35Z)
Optimal Streaming Algorithms for Multi-Armed Bandits [28.579280943038555]
ストリーミングBAI問題では,最大報酬が1-delta$の確率でアームを識別することが目的である。我々は,O(log Delta-1)$パス内で,ほぼインスタンス依存の最適なサンプル複雑性を実現するシングルアームメモリアルゴリズムを提案する。
論文参考訳（メタデータ） (2024-10-23T12:54:04Z)
$\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
コーディネータモデルにおける分散$ell_p$-regression問題のランダム化通信複雑性について考察する。 p = 2$、すなわち最小二乗回帰の場合、$tildeTheta(sd2 + sd/epsilon)$ bitsの最初の最適境界を与える。 p in (1,2)$ に対して、$tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound を得る。
論文参考訳（メタデータ） (2023-07-11T08:51:53Z)
Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
我々は、未知の分布から生じる無限に多くのバンドイットアームを用いて純粋探索を研究する。私たちのゴールは、平均的な報酬が1-delta$の1つの高品質なアームを、最高の$eta$-fraction of armsの1つとして$varepsilon$内で効率的に選択することにあります。
論文参考訳（メタデータ） (2023-06-03T04:00:47Z)
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
本研究では,有限水平マルコフ決定過程(MDP)によってモデル化されたエピソディック強化学習(RL)問題をバッチ数に制約を加えて検討する。我々は,$tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot)をほぼ最適に後悔するアルゴリズムを設計し,$(S,A,H,K)$の対数項を$K$で隠蔽する。技術的貢献は2つある: 1) 探索のためのほぼ最適設計スキーム
論文参考訳（メタデータ） (2022-10-15T09:22:22Z)
Low-degree learning and the metric entropy of polynomials [44.99833362998488]
少なくとも$Omega(sqrtvarepsilon)2dlog n leq log mathsfM(mathscrF_n,d,|cdot|_L,varepsilon)は2辺の推定値$c(1-varepsilon)2dlogを満たす。
論文参考訳（メタデータ） (2022-03-17T23:52:08Z)
Computational Complexity of Normalizing Constants for the Product of Determinantal Point Processes [12.640283469603357]
正規化定数の計算における計算複雑性について検討する。例えば、$sum_Sdet(bf A_S,S)p$は、すべての(固定された)正の偶数に対して、$p$ が UP-hard で Mod$_3$P-hard であることを示す。
論文参考訳（メタデータ） (2021-11-28T14:08:25Z)
Faster Rates of Differentially Private Stochastic Convex Optimization [7.93728520583825]
人口リスク関数がTysbakovノイズ条件(TNC)をパラメータ$theta>1$で満たす場合について検討した。第2部では,人口リスク関数が強く凸する特殊な事例に着目した。
論文参考訳（メタデータ） (2021-07-31T22:23:39Z)
Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
シャッフルモデルにおけるマルチアームバンディット(MAB)問題に対して,$(varepsilon,delta)$-differentially privateアルゴリズムを提案する。我々の上限は、集中モデルにおいて最もよく知られたアルゴリズムの後悔とほぼ一致し、局所モデルにおいて最もよく知られたアルゴリズムを著しく上回っている。
論文参考訳（メタデータ） (2021-06-05T14:11:01Z)
Bandits with many optimal arms [68.17472536610859]
最適アームの割合は$p*$、最適アームとサブ最適化アームの間の最小平均ギャップは$Delta$と書きます。我々は,累積的後悔設定と最良腕識別設定の両方において最適な学習率を特徴付ける。
論文参考訳（メタデータ） (2021-03-23T11:02:31Z)
An Optimal Separation of Randomized and Quantum Query Complexity [67.19751155411075]
すべての決定木に対して、与えられた順序 $ellsqrtbinomdell (1+log n)ell-1,$ sum to at least $cellsqrtbinomdell (1+log n)ell-1,$ where $n$ is the number of variables, $d$ is the tree depth, $c>0$ is a absolute constant。
論文参考訳（メタデータ） (2020-08-24T06:50:57Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
S$状態、$A$アクション、割引係数$gamma in (0,1)$、近似しきい値$epsilon > 0$の MDP が与えられた場合、$epsilon$-Optimal Policy を学ぶためのモデルなしアルゴリズムを提供する。十分小さな$epsilon$の場合、サンプルの複雑さで改良されたアルゴリズムを示す。
論文参考訳（メタデータ） (2020-06-06T13:34:41Z)
Revisiting EXTRA for Smooth Distributed Optimization [70.65867695317633]
改良された$Oleft(left(fracLmu+frac11-sigma_2(W)right)logfrac1epsilon (1-sigma_2(W))right)$。高速化されたEXTRAの通信複雑性は、$left(logfracLmu (1-sigma_2(W))right)$と$left(logfrac1epsilon (1。
論文参考訳（メタデータ） (2020-02-24T08:07:08Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。