Fugu-MT 論文翻訳(概要): Be Greedy in Multi-Armed Bandits

論文の概要: Be Greedy in Multi-Armed Bandits

arxiv url: http://arxiv.org/abs/2101.01086v1
Date: Mon, 4 Jan 2021 16:47:02 GMT
ステータス: 翻訳完了
システム内更新日: 2021-04-12 03:45:08.566093
Title: Be Greedy in Multi-Armed Bandits
Title（参考訳）: マルチアーマッドバンドにおける悲しみ
Authors: Matthieu Jedor, Jonathan Lou\"edec, Vianney Perchet
Abstract要約: グレディアルゴリズムは、各ラウンドで局所最適選択を行う、シーケンシャルな決定問題の最も単純なものである。 We provide a generic worst-case bound on the regret of the Greedy algorithm。連続・無限・多武装バンディット問題において,ほぼ最適の最悪の後悔境界を検証できることを証明した。
参考スコア（独自算出の注目度）: 22.301793734117805
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known to sometimes have poor performances, for instance even a linear regret (with respect to the time horizon) in the standard multi-armed bandit problem. On the other hand, this heuristic performs reasonably well in practice and it even has sublinear, and even near-optimal, regret bounds in some very specific linear contextual and Bayesian bandit models. We build on a recent line of work and investigate bandit settings where the number of arms is relatively large and where simple greedy algorithms enjoy highly competitive performance, both in theory and in practice. We first provide a generic worst-case bound on the regret of the Greedy algorithm. When combined with some arms subsampling, we prove that it verifies near-optimal worst-case regret bounds in continuous, infinite and many-armed bandit problems. Moreover, for shorter time spans, the theoretical relative suboptimality of Greedy is even reduced. As a consequence, we subversively claim that for many interesting problems and associated horizons, the best compromise between theoretical guarantees, practical performances and computational burden is definitely to follow the greedy heuristic. We support our claim by many numerical experiments that show significant improvements compared to the state-of-the-art, even for moderately long time horizon.
Abstract（参考訳）: グリーディアルゴリズムは、各ラウンドの局所最適選択を不注意に受け取り、探索および/または情報収集の利点を無視する、シーケンシャルな決定問題の最も単純なヒューリスティックである。理論的には、例えば、標準的な多腕バンディット問題において(時間軸に関して)線形な後悔さえも、パフォーマンスが悪かったことが知られている。一方、このヒューリスティックは実際かなりうまく機能し、非常に特定の線形文脈的およびベイズ的バンディットモデルにおいて、部分線型、あるいは近似的、後悔的境界さえも持つ。我々は,最近の研究成果に基づいて,腕数は比較的多く,単純な欲望アルゴリズムが理論上,実際上,高い競争性能を享受するバンディットの設定を調査した。まず、Greedyアルゴリズムの後悔に基づく一般的な最悪のケースを提供する。いくつかのアームのサブサンプリングと組み合わせると、連続、無限、多腕のバンディット問題において、ほぼ最適の最悪の後悔境界を検証することが証明される。さらに、短い時間スパンに対して、欲望の理論的相対的準最適性も減少する。結果として、多くの興味深い問題と関連する地平線に対して、理論的な保証、実用的性能、計算の負担の間の最良の妥協は、確実に欲望のヒューリスティックに従うことであると主張する。我々は,中程度に長い地平線でも最新技術と比較して大幅な改善を示す多くの数値実験によって,我々の主張を支持している。

関連論文リスト

Neural Dueling Bandits [58.90189511247936]
ニューラルネットワークを用いて、予め選択した腕の好みフィードバックを用いて報酬関数を推定する。次に、理論結果を二項フィードバックによる文脈的帯域幅問題に拡張し、それはそれ自体は自明な寄与ではない。
論文参考訳（メタデータ） (2024-07-24T09:23:22Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
本稿では, 一般化線形モデル(GLM)から, デュエルアームのバイナリ比較を生成するコンテキストデュエルバンド問題について検討する。本稿では,SupLinUCB型アルゴリズムを提案する。このアルゴリズムは,計算効率と分散を意識したリセットバウンド$tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$を提案する。我々の後悔は、比較が決定論的である場合の直感的な期待と自然に一致し、アルゴリズムは$tilde O(d)$ regretにのみ悩まされる。
論文参考訳（メタデータ） (2023-10-02T08:15:52Z)
Bandit Social Learning: Exploration under Myopic Behavior [54.767961587919075]
オンラインプラットフォーム上でのレビューによって動機付けられた社会学習のダイナミクスについて検討する。エージェントはまとめて単純なマルチアームのバンディットプロトコルに従うが、各エージェントは探索を伴わずにミオプティカルに振る舞う。このような振る舞いに対して,スターク学習の失敗を導出し,好意的な結果を提供する。
論文参考訳（メタデータ） (2023-02-15T01:57:57Z)
Dual Instrumental Method for Confounded Kernelized Bandits [0.0]
文脈的帯域幅問題は、様々な分野の幅広い応用のフレームワークである。本稿では,騒音がコンテキストと報酬の両方に影響を与える潜在的共同設立者となる,包括的バンドイット問題を提案する。双対楽器変数回帰は真の報酬関数を正しく識別できることを示す。
論文参考訳（メタデータ） (2022-09-07T15:25:57Z)
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences [28.79598714109439]
両環境および敵環境における$K$武器のデュエルバンディットの問題について検討する。まず,マルチアームのバンディットに対して,任意の(一般的な)デュエル・バンドレットから新たなリダクションを提案する。提案アルゴリズムは,コンドルチェット・ウィンナーベンチマークに対して最適な$O(sum_i = 1K fraclog TDelta_i)$ regret boundを達成した最初のアルゴリズムでもある。
論文参考訳（メタデータ） (2022-02-14T13:37:23Z)
On Slowly-varying Non-stationary Bandits [25.305949034527202]
我々は、ゆっくりと変化する性質を持つ非定常包帯の動的後悔を考察する。我々は、ゆっくりと変化する非定常帯域に対して、最初のインスタンス依存後悔上限を確立する。我々のアルゴリズムは基本的にミニマックス最適であることを示す。
論文参考訳（メタデータ） (2021-10-25T12:56:19Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
複数の次元に沿った最先端技術を改善する新しいアルゴリズムを提案する。非文脈線形帯域の特別な場合において、学習地平線に対して最小限の最適性を確立する。
論文参考訳（メタデータ） (2020-10-23T09:12:47Z)
On Regret with Multiple Best Arms [12.315392649501101]
バンディット設定における複数のベスト/ニア最適アームの存在に関する後悔問題について検討する。我々の目標は、問題の未知の硬さに自動的に適応できるアルゴリズムを設計することです。
論文参考訳（メタデータ） (2020-06-26T04:01:46Z)
A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
両腕の報酬が他の腕の報酬と相関する可能性のある有限腕包帯について検討した。本稿では、与えられた構造を利用して、真のバンディット問題のパラメータに対する信頼セットを構築する新しい位相アルゴリズムを提案する。
論文参考訳（メタデータ） (2020-05-23T19:52:44Z)
Greedy Algorithm almost Dominates in Smoothed Contextual Bandits [100.09904315064372]
オンライン学習アルゴリズムは探索と搾取のバランスをとる必要がある。欲求的アプローチは、他のアルゴリズムのベイズ的後悔率とほぼ一致していることを示す。
論文参考訳（メタデータ） (2020-05-19T18:11:40Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
平均ではなく報酬分布のモードを値とするマルチアームバンディットの理論を導入する。我々は,我々のアルゴリズムが逆雑音列による腕の摂動に頑健であることを示すシミュレーションで示す。
論文参考訳（メタデータ） (2020-03-05T21:29:27Z)
The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms [10.662105162882526]
本研究は,Emphmany-armed regimeにおける$k$-armed bandit問題について考察する。以上の結果から,多腕の環境下での強欲なアルゴリズムには,新たなエフェフリー探索法が有用であることが示唆された。
論文参考訳（メタデータ） (2020-02-24T08:59:34Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。