Fugu-MT 論文翻訳(概要): Batched Stochastic Bandit for Nondegenerate Functions

論文の概要: Batched Stochastic Bandit for Nondegenerate Functions

arxiv url: http://arxiv.org/abs/2405.05733v2
Date: Thu, 29 Aug 2024 17:58:35 GMT
ステータス: 翻訳完了
システム内更新日: 2024-08-30 19:08:38.950735
Title: Batched Stochastic Bandit for Nondegenerate Functions
Title（参考訳）: 非退化関数に対するバッチ確率帯域
Authors: Yu Liu, Yunlu Shu, Tianyu Wang,
Abstract要約: 本稿では,非退化関数に対するバッチ帯域学習問題について検討する。本稿では,非退化関数に対するバッチバンドイット問題をほぼ最適に解くアルゴリズムを提案する。
参考スコア（独自算出の注目度）: 8.015503209312786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{{\mathcal{O}}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs $\mathcal{O} (\log \log T)$ batches to achieve this regret. We also provide lower bound analysis for this problem. More specifically, we prove that over some (compact) doubling metric space of doubling dimension $d$: 1. For any policy $\pi$, there exists a problem instance on which $\pi$ admits a regret of order ${\Omega} ( A_-^d \sqrt{T})$; 2. No policy can achieve a regret of order $ A_-^d \sqrt{T} $ over all problem instances, using less than $ \Omega ( \log \log T ) $ rounds of communications. Our lower bound analysis shows that the GN algorithm achieves near optimal regret with minimal number of batches.
Abstract（参考訳）: 本稿では,非退化関数に対するバッチ帯域学習問題について検討する。本稿では,非退化関数に対するバッチバンドイット問題をほぼ最適に解くアルゴリズムを提案する。より具体的には、Geometric Narrowing (GN) と呼ばれるアルゴリズムを導入し、その残差は$\widetilde{{\mathcal{O}}} (A_{+}^d \sqrt{T} )$である。さらに、GNはこの後悔を達成するために$\mathcal{O} (\log \log T)$のバッチのみを必要とする。また、この問題に対する低境界解析も提供する。より具体的には、ある(コンパクトな)二重計量空間において、次元が$d$:であることを示す。 1. 任意のポリシー $\pi$ に対して、$\pi$ が次数 ${\Omega} ( A_-^d \sqrt{T})$ の後悔を認める問題インスタンスが存在する。 2. A_-^d \sqrt{T} $ の全ての問題インスタンスに対して、$ \Omega ( \log \log T ) $ の通信ラウンドを使用するという遺言は得られない。我々の低境界解析は、GNアルゴリズムが最小のバッチ数でほぼ最適に後悔することを示す。

関連論文リスト

Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries [34.38978643261337]
本稿では,適応的相手に対する帯域フィードバックの下で,有向非巡回グラフ(DAG)における最短経路問題について検討する。我々は,任意の適応的敵に対して高い確率で$tilde O(sqrt|E|Tlog |X|)$の最小限の最小残差を求めるアルゴリズムを提案する。
論文参考訳（メタデータ） (2025-04-01T06:35:42Z)
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
本研究では,有限水平マルコフ決定過程(MDP)によってモデル化されたエピソディック強化学習(RL)問題をバッチ数に制約を加えて検討する。我々は,$tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot)をほぼ最適に後悔するアルゴリズムを設計し,$(S,A,H,K)$の対数項を$K$で隠蔽する。技術的貢献は2つある: 1) 探索のためのほぼ最適設計スキーム
論文参考訳（メタデータ） (2022-10-15T09:22:22Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
自然クエリモデルにより,アルゴリズムが$O(log T)$ regretsを$O(sqrtT)$ hintsで得ることを示す。また、$o(sqrtT)$ hintsは$Omega(sqrtT)$ regretより保証できないことも示しています。
論文参考訳（メタデータ） (2021-11-09T16:50:18Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso banditは、報酬関数を定義するベクトルとスパースサポートを推定するアルゴリズムである。一般には $mathcalO( log d + sqrtT )$ や $mathcalO( log d + sqrtT )$ としてスケールする非漸近的後悔の上界を確立する。
論文参考訳（メタデータ） (2020-10-22T19:14:37Z)
Streaming Complexity of SVMs [110.63976030971106]
本稿では,ストリーミングモデルにおけるバイアス正規化SVM問題を解く際の空間複雑性について検討する。両方の問題に対して、$frac1lambdaepsilon$の次元に対して、$frac1lambdaepsilon$よりも空間的に小さいストリーミングアルゴリズムを得ることができることを示す。
論文参考訳（メタデータ） (2020-07-07T17:10:00Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
楽観的な$Q$は$mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of state, $A$ is the number of action, $H$ is the planning horizon, $T$ is the total number of steps, $Delta_min$ is the least sub-Optitimality gap。
論文参考訳（メタデータ） (2020-06-16T13:01:33Z)
Optimal Contextual Pricing and Extensions [32.152826900874075]
このアルゴリズムは$Omega(d log T)$ lower bound up to the $d log d$ additive factor。これらのアルゴリズムは、凸領域のスタイナー固有値を様々なスケールで有界化する新しい手法に基づいている。
論文参考訳（メタデータ） (2020-03-03T18:46:29Z)
Logistic Regression Regret: What's the Catch? [3.7311680121118345]
パラメータに対する$L_infty$制約の下で、対数的後悔を伴う下界を導出する。 L$の制約については、十分に大きな$d$の場合、後悔は$d$で線形であるが、$T$で対数化されなくなることが示されている。
論文参考訳（メタデータ） (2020-02-07T18:36:39Z)
Tight Regret Bounds for Noisy Optimization of a Brownian Motion [118.65407541895851]
本稿では,1次元ブラウン運動のベイズ最適化の問題点について考察する。 Omega(sigmasqrtT cdot log T)$.sigmasqrtT cdot log T)$.sigmasqrtT.sigmasqrtT.sigmasqrtT cdot log T)$.sigmasqrtT.sigmasqrtT.sigmasqrtT.sigmasqrtT.sigmasqrtT.sigmasqrtT.sigmasqrtT.sigmasqrtT.
論文参考訳（メタデータ） (2020-01-25T14:44:53Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。