Fugu-MT 論文翻訳(概要): Best-Arm Identification in Correlated Multi-Armed Bandits

論文の概要: Best-Arm Identification in Correlated Multi-Armed Bandits

arxiv url: http://arxiv.org/abs/2109.04941v1
Date: Fri, 10 Sep 2021 15:41:07 GMT
ステータス: 翻訳完了
システム内更新日: 2021-09-13 13:40:22.802066
Title: Best-Arm Identification in Correlated Multi-Armed Bandits
Title（参考訳）: 相関多関節バンドにおけるベストアーム同定
Authors: Samarth Gupta, Gauri Joshi, Osman Ya\u{g}an
Abstract要約: 本稿では,武器間の相関に関するドメイン知識を捉えた新しい相関バンディットフレームワークを提案する。 C-LUCBで得られた全サンプルは$mathcalOleft(sum_k in MathcalC logleft(frac1deltaright)$であり、通常の$mathcalOleft(sum_k in MathcalC logleft(frac1deltaright)$とは対照的である。
参考スコア（独自算出の注目度）: 9.180690233707645
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we consider the problem of best-arm identification in multi-armed bandits in the fixed confidence setting, where the goal is to identify, with probability $1-\delta$ for some $\delta>0$, the arm with the highest mean reward in minimum possible samples from the set of arms $\mathcal{K}$. Most existing best-arm identification algorithms and analyses operate under the assumption that the rewards corresponding to different arms are independent of each other. We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms in the form of upper bounds on expected conditional reward of an arm, given a reward realization from another arm. Our proposed algorithm C-LUCB, which generalizes the LUCB algorithm utilizes this partial knowledge of correlations to sharply reduce the sample complexity of best-arm identification. More interestingly, we show that the total samples obtained by C-LUCB are of the form $\mathcal{O}\left(\sum_{k \in \mathcal{C}} \log\left(\frac{1}{\delta}\right)\right)$ as opposed to the typical $\mathcal{O}\left(\sum_{k \in \mathcal{K}} \log\left(\frac{1}{\delta}\right)\right)$ samples required in the independent reward setting. The improvement comes, as the $\mathcal{O}(\log(1/\delta))$ term is summed only for the set of competitive arms $\mathcal{C}$, which is a subset of the original set of arms $\mathcal{K}$. The size of the set $\mathcal{C}$, depending on the problem setting, can be as small as $2$, and hence using C-LUCB in the correlated bandits setting can lead to significant performance improvements. Our theoretical findings are supported by experiments on the Movielens and Goodreads recommendation datasets.
Abstract（参考訳）: 本稿では,固定信頼設定における複数腕のバンディットにおける最善のアーム識別の問題について考察する。そこでは,少なくとも$\delta>0$ に対して 1-\delta$ の確率で,最小のアームセット $\mathcal{k}$ のサンプルで最大値のアームを識別することを目的としている。既存の最善のアーム識別アルゴリズムと分析の多くは、異なるアームに対応する報酬が互いに独立しているという仮定の下で動作する。本稿では,腕の条件付き報酬に対する上界の形で,腕間の相関に関するドメイン知識を把握し,他の腕から報酬を得られるような新しい相関型バンディットフレームワークを提案する。 LUCBアルゴリズムを一般化したアルゴリズムC-LUCBは、この相関関係の部分的知識を利用して、ベストアーム識別のサンプルの複雑さを著しく低減する。より興味深いことに、C-LUCB によって得られた全サンプルは、通常の $\mathcal{O}\left(\sum_{k \in \mathcal{C}} \log\left(\frac{1}{\delta}\right)\right)$ 独立報酬設定で必要とされる $\mathcal{O}\left(\sum_{k \in \mathcal{K}} \log\left(\frac{1}{\delta}\right)\right)$ の形で示される。この改善は、$\mathcal{o}(\log(1/\delta))$項が、元のアームセット$\mathcal{k}$のサブセットである$\mathcal{c}$の競合アームの集合に対してのみ要約されるためである。問題の設定によっては、セット$\mathcal{c}$のサイズは$$$という小さくなり、相関したバンディット設定でc-lucbを使用すると、パフォーマンスが大幅に向上する可能性がある。理論的知見はMovielensおよびGoodreadsレコメンデーションデータセットの実験によって裏付けられている。

関連論文リスト

Near Optimal Best Arm Identification for Clustered Bandits [10.146588539113614]
我々は、M$エージェントを$M$クラスタにグループ化し、各クラスタがバンドイト問題を解決する。本稿では,クラスタリング,ベストアーム識別(Cl-BAI),ベストアーム識別(BAI-Cl)の2つの新しいアルゴリズムを提案する。 Cl-BAIは、学習しているバンディット問題に基づいて最初のクラスタエージェントを2フェーズのアプローチで処理し、続いて各クラスタに最適なアームを特定する。 BAI-Clは、まず最適な腕を特定し、次にそれに従ってクラスタリングエージェントを識別することで、配列を反転させる。
論文参考訳（メタデータ） (2025-05-15T10:20:26Z)
Multi-objective Good Arm Identification with Bandit Feedback [0.589889361990138]
我々は,多目的のバンドイット設定において,良好な腕識別問題を考える。各ラウンド$t$に対して、プレーヤ/algorithmは1つのアーム$i_t$をプルし、$mathcalD_i_t$に従ってサンプリングされたM$次元ベクトルフィードバックを受け取る。提案アルゴリズムは,合成および実データを用いた実験において,他のベースラインよりも優れた数値性能が得られる。
論文参考訳（メタデータ） (2025-03-13T14:04:04Z)
An Algorithm for Fixed Budget Best Arm Identification with Combinatorial Exploration [3.9901365062418312]
我々は、K$$armed banditフレームワークにおける最適な腕識別問題を考察する。エージェントは1つのアームではなく、各タイムスロットでアームのサブセットをプレイすることができる。我々は、$log K$グループを構築し、最適なアームの存在を検出するための確率比テストを実行するアルゴリズムを提案する。
論文参考訳（メタデータ） (2025-02-03T15:10:08Z)
A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
我々は,帯域幅フィードバックを用いたクラスタリングと分散マッチング問題のための一般的なフレームワークを開発する。誤り確率が$delta$を超えない任意のオンラインアルゴリズムに対して、平均アームプル数に基づいて漸近的でない下界を導出する。
論文参考訳（メタデータ） (2024-09-08T12:19:12Z)
Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
クライアントが学習者にコミュニケーション制約のあるフィードバックを提供する分散マルチアームバンディットについて検討する。我々は、この下限を小さな加法係数にマッチさせるマルチフェーズ帯域幅アルゴリズム、$mathtUEtext-UCB++$を提案する。
論文参考訳（メタデータ） (2023-04-25T09:31:20Z)
Revisiting Simple Regret Minimization in Multi-Armed Bandits [33.88679679593314]
単純な後悔は、最高の腕や$epsilon$-good腕を欠く確率よりもあまり一般的ではない。本稿では,データ豊かさ (Tge n$) とデータ貧弱さ (T le n$) の両面において,単純な後悔の上限を改良した。より困難なデータ・ポーア・レシエーションのために、少なくとも1回は各腕をサンプリングすることなく、同じ改善を享受できるブラッケティングSH(BSH)を提案する。
論文参考訳（メタデータ） (2022-10-30T18:31:03Z)
Variance-Dependent Best Arm Identification [12.058652922228918]
マルチアームバンディットゲームにおける最適な腕を特定する問題について検討する。武器の報酬のギャップと分散を探索する適応アルゴリズムを提案する。このアルゴリズムは2つの対数項に最適であることを示す。
論文参考訳（メタデータ） (2021-06-19T04:13:54Z)
Bandits with many optimal arms [68.17472536610859]
最適アームの割合は$p*$、最適アームとサブ最適化アームの間の最小平均ギャップは$Delta$と書きます。我々は,累積的後悔設定と最良腕識別設定の両方において最適な学習率を特徴付ける。
論文参考訳（メタデータ） (2021-03-23T11:02:31Z)
Combinatorial Bandits without Total Order for Arms [52.93972547896022]
セット依存報酬分布を捕捉し、武器の合計順序を仮定しない報酬モデルを提案する。我々は、新しい後悔分析を開発し、$Oleft(frack2 n log Tepsilonright)$ gap-dependent regret boundと$Oleft(k2sqrtn T log Tright)$ gap-dependent regret boundを示す。
論文参考訳（メタデータ） (2021-03-03T23:08:59Z)
Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
我々は、腕の総数が膨大であることができるトップ$ k$極端な文脈的包帯問題を研究します。まず,Inverse Gap Weighting戦略を用いて,非極端に実現可能な設定のアルゴリズムを提案する。我々のアルゴリズムは、$O(ksqrt(A-k+1)T log (|mathcalF|T))$である。
論文参考訳（メタデータ） (2021-02-15T19:10:52Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
我々は、C$2$UCBアルゴリズムが分割マトロイド制約に対して最適な後悔結合$tildeO(dsqrtkT + dk)$を有することを示した。一般的な制約に対して,C$2$UCBアルゴリズムで腕の報酬推定値を変更するアルゴリズムを提案する。
論文参考訳（メタデータ） (2021-01-20T04:29:18Z)
An Optimal Elimination Algorithm for Learning a Best Arm [37.18327505953403]
古典的な問題である$(epsilon,delta)$-PAC学習をベストアームとみなす。本稿では,$(epsilon,delta)$-PAC学習を最適なアームとして提案する。
論文参考訳（メタデータ） (2020-06-20T20:21:33Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。