Fugu-MT 論文翻訳(概要): Optimal Exploration is no harder than Thompson Sampling

論文の概要: Optimal Exploration is no harder than Thompson Sampling

arxiv url: http://arxiv.org/abs/2310.06069v2
Date: Tue, 24 Oct 2023 20:13:58 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-26 19:19:33.426780
Title: Optimal Exploration is no harder than Thompson Sampling
Title（参考訳）: 最適な探索はトンプソンサンプリングよりも難しくない
Authors: Zhaoqi Li, Kevin Jamieson, Lalit Jain
Abstract要約: a pure exploration linear bandit problem to return $argmax_zin mathcalZ ztoptheta_ast with $xtoptheta_ast with $xin mathcalXsubset mathbbRd$。この複雑さは、後続サンプリングとargmaxオラクルへのアクセスを必要とするだけであり、$mathcalZ$を列挙する必要がない、後悔の最小化のために人気で単純なトンプソンサンプリングと矛盾する。
参考スコア（独自算出の注目度）: 14.726673043806391
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Given a set of arms $\mathcal{Z}\subset \mathbb{R}^d$ and an unknown parameter vector $\theta_\ast\in\mathbb{R}^d$, the pure exploration linear bandit problem aims to return $\arg\max_{z\in \mathcal{Z}} z^{\top}\theta_{\ast}$, with high probability through noisy measurements of $x^{\top}\theta_{\ast}$ with $x\in \mathcal{X}\subset \mathbb{R}^d$. Existing (asymptotically) optimal methods require either a) potentially costly projections for each arm $z\in \mathcal{Z}$ or b) explicitly maintaining a subset of $\mathcal{Z}$ under consideration at each time. This complexity is at odds with the popular and simple Thompson Sampling algorithm for regret minimization, which just requires access to a posterior sampling and argmax oracle, and does not need to enumerate $\mathcal{Z}$ at any point. Unfortunately, Thompson sampling is known to be sub-optimal for pure exploration. In this work, we pose a natural question: is there an algorithm that can explore optimally and only needs the same computational primitives as Thompson Sampling? We answer the question in the affirmative. We provide an algorithm that leverages only sampling and argmax oracles and achieves an exponential convergence rate, with the exponent being the optimal among all possible allocations asymptotically. In addition, we show that our algorithm can be easily implemented and performs as well empirically as existing asymptotically optimal methods.
Abstract（参考訳）: 腕の組 $\mathcal{Z}\subset \mathbb{R}^d$ と未知のパラメータベクトル $\theta_\ast\mathbb{R}^d$ が与えられたとき、純粋な探索線形バンドイ問題は $\arg\max_{z\in \mathcal{Z}} z^{\top}\theta_{\ast}$ を返すことを目的としており、$x^{\top}\theta_{\ast}$ と $x\in \mathcal{X}\subset \mathbb{R}^d$ のノイズ測定による確率が高い。既存の(漸近的に)最適な方法が必要か a) 各アームに対する潜在的にコストがかかるプロジェクション $z\in \mathcal{Z}$ b) それぞれの時点で$\mathcal{Z}$のサブセットを明示的に保持すること。この複雑さは、後悔の最小化のために人気があり単純なトンプソンサンプリングアルゴリズムと矛盾する。これは後続サンプリングとargmaxオラクルへのアクセスを必要とするだけであり、任意の時点で$\mathcal{Z}$を列挙する必要はない。残念ながら、トンプソンサンプリングは純粋な探査に最適ではないことが知られている。最適な探索が可能で、トンプソンサンプリングと同じ計算プリミティブしか必要としないアルゴリズムがあるのだろうか? 私たちはその質問を肯定的に答える。我々はサンプリングとargmaxのみを利用するアルゴリズムを提供し、指数関数収束率を達成し、指数は漸近的に可能な全ての割り当ての中で最適である。さらに,本アルゴリズムは,既存の漸近的最適手法と同様に,容易に実装および実行可能であることを示す。

関連論文リスト

Linear Bandits on Ellipsoids: Minimax Optimal Algorithms [5.678465386088928]
作用の集合が楕円体であるような計算線形帯域を考える。この問題に対して、最初の既知のミニマックス最適アルゴリズムを提供する。実行には、時間$O(dT + d2 log(T/d) + d3)$とメモリ$O(d2)$のみが必要である。
論文参考訳（メタデータ） (2025-02-24T14:12:31Z)
Dueling Optimization with a Monotone Adversary [35.850072415395395]
凸最適化の一般化である単調逆数を用いたデュエル最適化の問題点について検討する。目的は、最小値$mathbfx*$の関数$fcolon XをmathbbRdに変換するために、オンラインアルゴリズムを設計することである。
論文参考訳（メタデータ） (2023-11-18T23:55:59Z)
Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [65.268245109828]
本稿では,マルチアームバンディット(R-CPE-MAB)問題の実測値について検討する。一般トンプソンサンプリング探索法(GenTS-Explore)と呼ばれるアルゴリズムを導入する。これは,アクションセットのサイズが指数関数的に$d$で大きい場合でも動作可能な,最初のアルゴリズムである。
論文参考訳（メタデータ） (2023-08-20T11:56:02Z)
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise [50.64137465792738]
この問題に対する効率的なSQアルゴリズムは、少なくとも$Omega(d1/2/(maxp, epsilon)2)$. のサンプル複雑性を必要とする。我々の下限は、この1/epsilon$に対する二次的依存は、効率的なアルゴリズムに固有のものであることを示唆している。
論文参考訳（メタデータ） (2023-07-13T18:59:28Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
On the complexity of All $\varepsilon$-Best Arms Identification [2.1485350418225244]
ガウスの報酬を持つ有限の多腕バンディットにおいて、$varepsilon$-optimal armsを全て識別する問題を考える。我々は,$varepsilon$-good arms w.h.p の集合を同定し,期待されるサンプルの複雑さの観点から最適性($delta$が 0 になるとき)を享受するトラック・アンド・ストップアルゴリズムを提案する。
論文参考訳（メタデータ） (2022-02-13T10:54:52Z)
The Fine-Grained Hardness of Sparse Linear Regression [12.83354999540079]
この問題に対して、より優れたブルートフォースアルゴリズムは存在しないことを示す。また,予測誤差が測定された場合,より優れたブラトフォースアルゴリズムが不可能であることを示す。
論文参考訳（メタデータ） (2021-06-06T14:19:43Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso banditは、報酬関数を定義するベクトルとスパースサポートを推定するアルゴリズムである。一般には $mathcalO( log d + sqrtT )$ や $mathcalO( log d + sqrtT )$ としてスケールする非漸近的後悔の上界を確立する。
論文参考訳（メタデータ） (2020-10-22T19:14:37Z)
Streaming Complexity of SVMs [110.63976030971106]
本稿では,ストリーミングモデルにおけるバイアス正規化SVM問題を解く際の空間複雑性について検討する。両方の問題に対して、$frac1lambdaepsilon$の次元に対して、$frac1lambdaepsilon$よりも空間的に小さいストリーミングアルゴリズムを得ることができることを示す。
論文参考訳（メタデータ） (2020-07-07T17:10:00Z)
MOTS: Minimax Optimal Thompson Sampling [89.2370817955411]
トンプソンサンプリングがミニマックス下限の$Omega(sqrtKT)$と$K$の武器付きバンディット問題に一致するかどうかという未解決の問題のままである。我々は,各タイミングで選択した腕のサンプリングインスタンスを適応的にクリップするMOTSと呼ばれるトンプソンサンプリングの変種を提案する。我々は、この単純なトンプソンサンプリングの変種が、有限時間地平線に対して$O(sqrtKT)$のミニマックス最適後悔境界と、$T$が無限に近づくときのガウス報酬に対する最適後悔境界を達成することを証明した。
論文参考訳（メタデータ） (2020-03-03T21:24:39Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。