Fugu-MT 論文翻訳(概要): Understanding Bandits with Graph Feedback

論文の概要: Understanding Bandits with Graph Feedback

arxiv url: http://arxiv.org/abs/2105.14260v1
Date: Sat, 29 May 2021 09:35:28 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-01 17:30:28.886867
Title: Understanding Bandits with Graph Feedback
Title（参考訳）: グラフフィードバックによるバンディットの理解
Authors: Houshuang Chen (1), Zengfeng Huang (2), Shuai Li (1) and Chihao Zhang (1) ((1) Shanghai Jiao Tong University, (2) Fudan University)
Abstract要約: 一般的な後悔すべき上界$Oleft(left(delta*log |V|right)frac13Tfrac23right)$と下界$Omegaleft(left(delta*/alpharight)frac13Tfrac23right)$を証明します。いくつかのグラフの特別な族に対して、$left(log |V|right)frac13$ factorを排除できることを示す。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011], is modeled by a directed graph $G=(V,E)$ where $V$ is the collection of bandit arms, and once an arm is triggered, all its incident arms are observed. A fundamental question is how the structure of the graph affects the min-max regret. We propose the notions of the fractional weak domination number $\delta^*$ and the $k$-packing independence number capturing upper bound and lower bound for the regret respectively. We show that the two notions are inherently connected via aligning them with the linear program of the weakly dominating set and its dual -- the fractional vertex packing set respectively. Based on this connection, we utilize the strong duality theorem to prove a general regret upper bound $O\left(\left( \delta^*\log |V|\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ and a lower bound $\Omega\left(\left(\delta^*/\alpha\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ where $\alpha$ is the integrality gap of the dual linear program. Therefore, our bounds are tight up to a $\left(\log |V|\right)^{\frac{1}{3}}$ factor on graphs with bounded integrality gap for the vertex packing problem including trees and graphs with bounded degree. Moreover, we show that for several special families of graphs, we can get rid of the $\left(\log |V|\right)^{\frac{1}{3}}$ factor and establish optimal regret.
Abstract（参考訳）: グラフフィードバックに関するバンディット問題は[Mannor and Shamir, NeurIPS 2011]で提案され、有向グラフ$G=(V,E)$でモデル化されている。基本的な問題は、グラフの構造がmin-maxの後悔にどう影響するかである。そこで本稿では, 差分弱支配数 $\delta^*$, $k$-packing independent number の概念をそれぞれ提案する。 2つの概念は、弱支配集合の線型プログラムと、その双対 -- 分数頂点パッキング集合とをそれぞれ整合させることにより、本質的に連結であることを示す。この関係に基づいて、強い双対性定理(英: strong duality theorem)を用いて、一般的な後悔の上限である$o\left(\left( \delta^*\log |v|\right)^{\frac{1}{3}}t^{\frac{2}{3}}\right)$ と下限の$\omega\left(\left(\delta^*/\alpha\right)^{\frac{1}{3}}t^{\frac{2}{3}}\right)$ を証明する。したがって、我々の境界は、有界次数を持つ木やグラフを含む頂点パッキング問題に対する有界積分性ギャップを持つグラフ上の $\left(\log |v|\right)^{\frac{1}{3}}$ factor に厳密である。さらに、グラフのいくつかの特別な族に対して、$\left(\log |v|\right)^{\frac{1}{3}}$因子を取り除き、最適な後悔を確立することができることを示す。

関連論文リスト

Preparing graph states forbidding a vertex-minor [1.864621482724548]
測定に基づく量子コンピューティングは、準備された安定化状態に非クリフォード測定を加えることでプリフォームされる。すべての安定化状態はグラフ状態と局所クリフォード同値であるため、グラフ状態$leftvert G rightrangle$にフォーカスすることができる。グラフの特定の固有クラスに$G$が含まれているとき、かなり改善された境界を得る。
論文参考訳（メタデータ） (2025-03-31T23:25:35Z)
Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
本稿では,各ラウンドで選択したアームのスイッチに対して,切替コストが$lambda$の逆帯域問題について検討する。ミニマックスの後悔とそれにアプローチするための設計アルゴリズムの限界を低くする。
論文参考訳（メタデータ） (2024-04-02T12:15:37Z)
On Interpolating Experts and Multi-Armed Bandits [1.9497955504539408]
我々は、$mathbfm$-MABに対して厳密なミニマックス後悔境界を証明し、その純粋な探索バージョンである$mathbfm$-BAIに対して最適なPACアルゴリズムを設計する。その結果、フィードバックグラフのいくつかのファミリに対して、厳密なミニマックス後悔境界を得た。
論文参考訳（メタデータ） (2023-07-14T10:38:30Z)
Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
ミニマックスの後悔は任意のグラフと時間的地平線に対して$R*$に比例することを示す。複雑な探索戦略を導入し、最小限の最小後悔境界を達成する主アルゴリズムを定義する。
論文参考訳（メタデータ） (2023-06-05T15:35:00Z)
Stochastic Contextual Bandits with Graph-based Contexts [0.0]
オンライングラフ予測問題を文脈的帯域問題に一般化する。我々のアルゴリズムは Zimmert と Seldin[AISTAT'19, JMLR'21] による最適帯域幅アルゴリズムに依存している。
論文参考訳（メタデータ） (2023-05-02T14:51:35Z)
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs [62.52390282012508]
我々は、T$ラウンド以上の時間変化フィードバックグラフを持つ、敵対的な$K$武器付きバンディットに対する高い確率的後悔境界について検討する。提案アルゴリズムは,高い確率で最適に後悔する$widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$を達成するアルゴリズムを開発した。また,弱可観測グラフに対する最適高確率残差を求めるアルゴリズムを開発した。
論文参考訳（メタデータ） (2022-10-04T04:36:15Z)
Causal Bandits for Linear Structural Equation Models [58.2875460517691]
本稿では,因果図形モデルにおける最適な介入順序を設計する問題について検討する。グラフの構造は知られており、ノードは$N$である。頻繁性(UCBベース)とベイズ的設定に2つのアルゴリズムを提案する。
論文参考訳（メタデータ） (2022-08-26T16:21:31Z)
Laplacian State Transfer on Graphs with an Edge Perturbation Between Twin Vertices [0.0]
グラフのラプラシア行列に対する量子状態移動を考える。一対の双対頂点間の量子状態移動の存在を、頂点間の端が摂動しているときに検討する。
論文参考訳（メタデータ） (2021-09-11T15:48:18Z)
Combinatorial Bandits without Total Order for Arms [52.93972547896022]
セット依存報酬分布を捕捉し、武器の合計順序を仮定しない報酬モデルを提案する。我々は、新しい後悔分析を開発し、$Oleft(frack2 n log Tepsilonright)$ gap-dependent regret boundと$Oleft(k2sqrtn T log Tright)$ gap-dependent regret boundを示す。
論文参考訳（メタデータ） (2021-03-03T23:08:59Z)
Adversarial Dueling Bandits [85.14061196945599]
本稿では,反逆デュエルバンドにおける後悔の問題を紹介する。学習者は、繰り返し一対のアイテムを選択し、このペアに対する相対的な二項利得フィードバックのみを観察しなければならない。我々の主な成果は、EmphBorda-winnerの1組の$K$アイテムと比較して、T$ラウンド後悔するアルゴリズムです。
論文参考訳（メタデータ） (2020-10-27T19:09:08Z)
A Closer Look at Small-loss Bounds for Bandits with Graph Feedback [39.78074016649885]
グラフフィードバックを用いた対向多腕バンディットの低損失境界について検討する。一般の強可観測グラフに対する最初の小さな空間境界を導出する。また、弱可観測グラフに対する小空間境界を導出する最初の試みも行う。
論文参考訳（メタデータ） (2020-02-02T03:48:01Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。