Fugu-MT 論文翻訳(概要): Stochastic contextual bandits with graph feedback: from independence number to MAS number

論文の概要: Stochastic contextual bandits with graph feedback: from independence number to MAS number

arxiv url: http://arxiv.org/abs/2402.18591v1
Date: Mon, 12 Feb 2024 06:56:13 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-03 19:11:48.999803
Title: Stochastic contextual bandits with graph feedback: from independence number to MAS number
Title（参考訳）: グラフフィードバックを用いた確率的文脈的バンディット:独立数からmas数へ
Authors: Yuxiao Wen, Yanjun Han, Zhengyuan Zhou
Abstract要約: バニラの文脈的帯域幅よりも、よりリッチな構造を持つ対話型学習問題であるグラフフィードバックを用いたコンテキスト的帯域幅を考える。我々は、コンテキストシーケンスやフィードバックグラフの重要なクラスに対して、ほぼ最適に後悔するアルゴリズムを提供する。
参考スコア（独自算出の注目度）: 31.58368435306069
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider contextual bandits with graph feedback, a class of interactive learning problems with richer structures than vanilla contextual bandits, where taking an action reveals the rewards for all neighboring actions in the feedback graph under all contexts. Unlike the multi-armed bandits setting where a growing literature has painted a near-complete understanding of graph feedback, much remains unexplored in the contextual bandits counterpart. In this paper, we make inroads into this inquiry by establishing a regret lower bound $\Omega(\sqrt{\beta_M(G) T})$, where $M$ is the number of contexts, $G$ is the feedback graph, and $\beta_M(G)$ is our proposed graph-theoretical quantity that characterizes the fundamental learning limit for this class of problems. Interestingly, $\beta_M(G)$ interpolates between $\alpha(G)$ (the independence number of the graph) and $\mathsf{m}(G)$ (the maximum acyclic subgraph (MAS) number of the graph) as the number of contexts $M$ varies. We also provide algorithms that achieve near-optimal regrets for important classes of context sequences and/or feedback graphs, such as transitively closed graphs that find applications in auctions and inventory control. In particular, with many contexts, our results show that the MAS number completely characterizes the statistical complexity for contextual bandits, as opposed to the independence number in multi-armed bandits.
Abstract（参考訳）: グラフフィードバックを用いた文脈的帯域幅は、バニラの文脈的帯域幅よりもリッチな構造を持つ対話的学習問題であり、任意のコンテキスト下でのすべての近隣行動に対する報酬を明らかにする。増大する文学がグラフフィードバックのほぼ完全な理解を描いているマルチアームのバンディット設定とは異なり、コンテクストのバンディットでは多くが未調査のままである。そこで,本稿では,m$ が文脈数,$g$ がフィードバックグラフ,$\beta_m(g)$ が提案するグラフ理論的量であり,この問題に対する基本的な学習限界を特徴づけるものであることを証明して,後悔すべき下限値 $\omega(\sqrt{\beta_m(g) t})$ を確立することにより,この問いへの参入を行う。興味深いことに、$\beta_M(G)$は$\alpha(G)$(グラフの独立数)と$\mathsf{m}(G)$(グラフの最大非巡回部分グラフ(MAS)数)の間の補間を行う。また,コンテクストシーケンスやフィードバックグラフの重要なクラス,例えばオークションやインベントリ管理の応用を見出す推移的に閉じたグラフに対して,最適に近い後悔を実現するアルゴリズムを提供する。特に,多くの文脈において,mas数は,多腕バンディットの独立数とは対照的に,文脈バンディットの統計的複雑性を完全に特徴付けることを示す。

関連論文リスト

Nearly Tight Bounds for Cross-Learning Contextual Bandits with Graphical Feedback [17.43748116766233]
グラフィカルフィードバックを用いたクロスラーニングの文脈的帯域幅問題に注目が集まっている。鍵となる理論的問題は、$widetildeO(sqrtalpha T)$ regret のアルゴリズムが存在するかどうかである。
論文参考訳（メタデータ） (2025-02-07T05:52:56Z)
Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Banditsは、安静と安静のバンディットを一般化するフレームワークである。本研究は,2種類の単調包帯に焦点をあてる: 立ち上がり, 腕の期待される報酬が増加する, 引き金の数が増える, 回転する, 反対の行動が起こる。
論文参考訳（メタデータ） (2024-09-09T18:23:07Z)
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
現実のテキストグラフを対象とするフレキシブルな問合せフレームワークを開発した。一般のテキストグラフに対する最初の検索拡張生成(RAG)手法を提案する。 G-Retrieverは、このタスクをSteiner Tree最適化問題として定式化し、グラフ上でRAGを実行する。
論文参考訳（メタデータ） (2024-02-12T13:13:04Z)
Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
ミニマックスの後悔は任意のグラフと時間的地平線に対して$R*$に比例することを示す。複雑な探索戦略を導入し、最小限の最小後悔境界を達成する主アルゴリズムを定義する。
論文参考訳（メタデータ） (2023-06-05T15:35:00Z)
Stochastic Contextual Bandits with Graph-based Contexts [0.0]
オンライングラフ予測問題を文脈的帯域問題に一般化する。我々のアルゴリズムは Zimmert と Seldin[AISTAT'19, JMLR'21] による最適帯域幅アルゴリズムに依存している。
論文参考訳（メタデータ） (2023-05-02T14:51:35Z)
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs [62.52390282012508]
我々は、T$ラウンド以上の時間変化フィードバックグラフを持つ、敵対的な$K$武器付きバンディットに対する高い確率的後悔境界について検討する。提案アルゴリズムは,高い確率で最適に後悔する$widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$を達成するアルゴリズムを開発した。また,弱可観測グラフに対する最適高確率残差を求めるアルゴリズムを開発した。
論文参考訳（メタデータ） (2022-10-04T04:36:15Z)
Multi-armed Bandit Learning on a Graph [0.0]
そこで,エージェントがグラフの上を移動して,異なるノードから収集した報酬を最大化するグラフバンドイットと呼ばれるMABの拡張について検討する。我々は,楽観主義の原理を用いて長期探査・探索のバランスをとる学習アルゴリズムG-UCBを設計する。提案アルゴリズムは,ノード数として$O(sqrt|S|Tlog(T)+D|S|log T)$学習後悔を実現する。
論文参考訳（メタデータ） (2022-09-20T02:31:42Z)
Improved Algorithms for Bandit with Graph Feedback via Regret Decomposition [2.3034251503343466]
グラフフィードバックによるバンディットの問題は、マルチアームバンディット(MAB)問題と専門家のアドバイスによる学習の両方を一般化する。本稿では,フィードバックグラフの分割に基づく新しいアルゴリズムフレームワークを提案する。
論文参考訳（メタデータ） (2022-05-30T13:07:42Z)
Beyond Bandit Feedback in Online Multiclass Classification [17.07011090727996]
学習者のフィードバックが任意の有向グラフによって決定されるような環境で,オンライン多クラス分類の問題について検討する。任意のフィードバックグラフで動作する,初のオンラインマルチクラスアルゴリズムであるGappletronを紹介する。
論文参考訳（メタデータ） (2021-06-07T13:22:30Z)
Adversarial Linear Contextual Bandits with Graph-Structured Side Observations [80.95090605985042]
学習エージェントは、$d$-dimensionalコンテキストベクトルで提示された後、一連の$k$アクションから繰り返し選択する。エージェントは選択されたアクションの損失を誘発し、観察するが、観察構造における隣り合うアクションの損失も観察する。 textttEXP3に基づく2つの効率的なアルゴリズムが開発された。
論文参考訳（メタデータ） (2020-12-10T15:40:07Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
制約付き文脈線形帯域設定について検討し、エージェントの目標は一連のポリシーを作成することである。楽観的悲観的線形帯域(OPLB)と呼ばれる,この問題に対する高信頼束縛アルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-17T22:32:19Z)
Iterative Context-Aware Graph Inference for Visual Dialog [126.016187323249]
本稿では,新しいコンテキスト認識グラフ(CAG)ニューラルネットワークを提案する。グラフの各ノードは、オブジェクトベース(視覚)と履歴関連(テキスト)コンテキスト表現の両方を含む、共同意味機能に対応している。
論文参考訳（メタデータ） (2020-04-05T13:09:37Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。