Fugu-MT 論文翻訳(概要): Contextual Combinatorial Volatile Bandits with Satisfying via Gaussian Processes

論文の概要: Contextual Combinatorial Volatile Bandits with Satisfying via Gaussian Processes

arxiv url: http://arxiv.org/abs/2111.14778v1
Date: Mon, 29 Nov 2021 18:39:09 GMT
ステータス: 翻訳完了
システム内更新日: 2021-11-30 16:34:04.624873
Title: Contextual Combinatorial Volatile Bandits with Satisfying via Gaussian Processes
Title（参考訳）: ガウス過程を満足するコンビネータ型揮発性バンディット
Authors: Sepehr Elahi, Baran Atalar, Sevda \"O\u{g}\"ut, Cem Tekin
Abstract要約: 我々は、グループ閾値を持つContextual Combinatorial Banditsと呼ばれる新しいバンディットモデルを提案する。我々のモデルは、スーパーアームをベースアームのグループのサブセットと考えることで、包帯を仮定する。
参考スコア（独自算出の注目度）: 8.602553195689511
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In many real-world applications of combinatorial bandits such as content caching, rewards must be maximized while satisfying minimum service requirements. In addition, base arm availabilities vary over time, and actions need to be adapted to the situation to maximize the rewards. We propose a new bandit model called Contextual Combinatorial Volatile Bandits with Group Thresholds to address these challenges. Our model subsumes combinatorial bandits by considering super arms to be subsets of groups of base arms. We seek to maximize super arm rewards while satisfying thresholds of all base arm groups that constitute a super arm. To this end, we define a new notion of regret that merges super arm reward maximization with group reward satisfaction. To facilitate learning, we assume that the mean outcomes of base arms are samples from a Gaussian Process indexed by the context set ${\cal X}$, and the expected reward is Lipschitz continuous in expected base arm outcomes. We propose an algorithm, called Thresholded Combinatorial Gaussian Process Upper Confidence Bounds (TCGP-UCB), that balances between maximizing cumulative reward and satisfying group reward thresholds and prove that it incurs $\tilde{O}(K\sqrt{T\overline{\gamma}_{T}} )$ regret with high probability, where $\overline{\gamma}_{T}$ is the maximum information gain associated with the set of base arm contexts that appeared in the first $T$ rounds and $K$ is the maximum super arm cardinality of any feasible action over all rounds. We show in experiments that our algorithm accumulates a reward comparable with that of the state-of-the-art combinatorial bandit algorithm while picking actions whose groups satisfy their thresholds.
Abstract（参考訳）: コンテントキャッシングのような組合せバンディットの現実世界の多くのアプリケーションでは、最小限のサービス要件を満たしながら報酬を最大化しなければならない。加えて、ベースアームの有効性は時間とともに変化し、報酬を最大化するために状況に適応する必要がある。我々は,これらの課題に対処するために,グループ閾値を持つContextual Combinatorial Volatile Banditsと呼ばれる新しいバンディットモデルを提案する。我々のモデルは、スーパーアームをベースアームのグループのサブセットと考えることで、組合せ帯域を仮定する。スーパーアームを構成する全てのベースアームグループのしきい値を満たしつつ、スーパーアーム報酬の最大化を目指す。この目的のために,超アーム報酬の最大化とグループ報酬の満足度を融合する新たな後悔概念を定義する。学習を容易にするために、ベースアームの平均結果はコンテキストセット${\cal X}$でインデックスされたガウス過程のサンプルであり、期待される報酬は期待されるベースアームの結果においてリプシッツ連続である。 We propose an algorithm, called Thresholded Combinatorial Gaussian Process Upper Confidence Bounds (TCGP-UCB), that balances between maximizing cumulative reward and satisfying group reward thresholds and prove that it incurs $\tilde{O}(K\sqrt{T\overline{\gamma}_{T}} )$ regret with high probability, where $\overline{\gamma}_{T}$ is the maximum information gain associated with the set of base arm contexts that appeared in the first $T$ rounds and $K$ is the maximum super arm cardinality of any feasible action over all rounds. 実験では,本アルゴリズムが最先端のコンビネータ型バンディットアルゴリズムに匹敵する報酬を蓄積し,群が閾値を満たしている動作を選択する。

関連論文リスト

Continuous K-Max Bandits [54.21533414838677]
我々は、連続的な結果分布と弱い値-インデックスフィードバックを持つ、$K$-Maxのマルチアームバンディット問題について検討する。この設定は、レコメンデーションシステム、分散コンピューティング、サーバスケジューリングなどにおいて重要なアプリケーションをキャプチャします。我々の重要な貢献は、適応的な離散化とバイアス補正された信頼境界を組み合わせた計算効率の良いアルゴリズムDCK-UCBである。
論文参考訳（メタデータ） (2025-02-19T06:37:37Z)
An Algorithm for Fixed Budget Best Arm Identification with Combinatorial Exploration [3.9901365062418312]
我々は、K$$armed banditフレームワークにおける最適な腕識別問題を考察する。エージェントは1つのアームではなく、各タイムスロットでアームのサブセットをプレイすることができる。我々は、$log K$グループを構築し、最適なアームの存在を検出するための確率比テストを実行するアルゴリズムを提案する。
論文参考訳（メタデータ） (2025-02-03T15:10:08Z)
A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
我々は,帯域幅フィードバックを用いたクラスタリングと分散マッチング問題のための一般的なフレームワークを開発する。誤り確率が$delta$を超えない任意のオンラインアルゴリズムに対して、平均アームプル数に基づいて漸近的でない下界を導出する。
論文参考訳（メタデータ） (2024-09-08T12:19:12Z)
Bayesian Analysis of Combinatorial Gaussian Process Bandits [6.594362025904486]
GP-UCB, GP-BayesUCB, GP-TSの3つのアルゴリズムに対して, 新たな累積後悔境界を提供する。我々は,オンラインエネルギー効率ナビゲーションの課題に対処するために,我々のフレームワークを使用している。
論文参考訳（メタデータ） (2023-12-20T00:31:43Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
我々は,選択した$n$のアームセットのジョイント報酬以外の余分な情報が観測されない場合に,マルチアームのバンディット問題に対する新規グリーディ・バンディット(SGB)アルゴリズムを提案する。 SGBは最適化された拡張型コミットアプローチを採用しており、ベースアームの大きなセットを持つシナリオ用に特別に設計されている。
論文参考訳（メタデータ） (2023-12-13T11:08:25Z)
Max-Quantile Grouped Infinite-Arm Bandits [39.7239093424124]
無限に多くの腕からなる群が多数存在するバンド問題。本稿では,まず各グループから一定数のアームを要求し,次に有限腕グループ化された最大量子帯域幅アルゴリズムを実行する2段階アルゴリズムを提案する。インスタンス依存と最悪のケースの後悔の両方を特徴付け、後者に一致する下位境界を提供します。
論文参考訳（メタデータ） (2022-10-04T00:46:49Z)
Risk-Aware Algorithms for Combinatorial Semi-Bandits [7.716156977428555]
半帯域フィードバック下でのマルチアームバンディット問題について検討する。本稿では,最悪の場合の報酬のみを考慮したリスク尺度であるCVaR(Conditional Value-at-Risk)の最大化の問題を検討する。本稿では,バンディットのスーパーアームから得られる報酬のCVaRを最大化するアルゴリズムを提案する。
論文参考訳（メタデータ） (2021-12-02T11:29:43Z)
Max-Min Grouped Bandits [48.62520520818357]
マルチアームバンディット問題であるmax-min grouped banditsを導入する。ゴールは、最悪の腕が最高の平均報酬を持つグループを見つけることです。この問題はレコメンデーションシステムのようなアプリケーションには関心がある。
論文参考訳（メタデータ） (2021-11-17T01:59:15Z)
Contextual Combinatorial Volatile Bandits via Gaussian Processes [10.312968200748116]
利用可能なベースアームのセットとそのコンテキストによるコンテキスト的バンディット問題を考える。我々は,カーネル上信頼境界(O'CLOK-UCB)を用いた最適組合せ学習と最適化というアルゴリズムを提案する。両アルゴリズムが従来のUTBベースのアルゴリズムを現実的な設定で大幅に上回っていることを実験的に示す。
論文参考訳（メタデータ） (2021-10-05T18:02:10Z)
Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
我々は、腕の総数が膨大であることができるトップ$ k$極端な文脈的包帯問題を研究します。まず,Inverse Gap Weighting戦略を用いて,非極端に実現可能な設定のアルゴリズムを提案する。我々のアルゴリズムは、$O(ksqrt(A-k+1)T log (|mathcalF|T))$である。
論文参考訳（メタデータ） (2021-02-15T19:10:52Z)
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback [51.21673420940346]
コンビナーシャルバンディットはマルチアームバンディットを一般化し、エージェントが腕のセットを選択し、選択したセットに含まれる各腕の騒々しい報酬を観察します。我々は, 最善の腕を一定の信頼度で識別する純粋爆発問題と, 応答集合の構造が動作集合の1つと異なるような, より一般的な設定に注目する。有限多面体に対するプロジェクションフリーオンライン学習アルゴリズムに基づいて、凸的に最適であり、競争力のある経験的性能を持つ最初の計算効率の良いアルゴリズムである。
論文参考訳（メタデータ） (2021-01-21T10:35:09Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
我々は、C$2$UCBアルゴリズムが分割マトロイド制約に対して最適な後悔結合$tildeO(dsqrtkT + dk)$を有することを示した。一般的な制約に対して,C$2$UCBアルゴリズムで腕の報酬推定値を変更するアルゴリズムを提案する。
論文参考訳（メタデータ） (2021-01-20T04:29:18Z)
Tight Lower Bounds for Combinatorial Multi-Armed Bandits [72.56064196252498]
Combinatorial Multi-Armed Bandit 問題は、エージェントが各ラウンドで一組の腕を選択する、シーケンシャルな意思決定問題である。最近提案されたGini重み付き滑らか度パラメータが単調報酬関数の下限を決定することを示す。
論文参考訳（メタデータ） (2020-02-13T08:53:43Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。